# Computer Science Colloquium

Organiser: Tom Gur

Colloquia currently take place online. Additional titles and abstracts will be added as details become available.

##### Next Colloquium
###### 4 February 2021

14:00 - 15:00

Title: Local computation algorithms

Abstract: Consider a setting in which inputs to and outputs from a computational problem are so large, that there is not time to read them in their entirety. However, if one is only interested in small parts of the output at any given time, is it really necessary to solve the entire computational problem? Is it even necessary to view the whole input? We survey recent work in the model of "local computation algorithms" which for a given input, supports queries by a user to values of specified bits of a legal output. The goal is to design local computation algorithms in such a way that very little of the input needs to be seen in order to determine the value of any single bit of the output. Though this model describes sequential computations, techniques from local distributed algorithms have been extremely important in designing efficient local computation algorithms. In this talk, we describe results on a variety of problems for which sublinear time and space local computation algorithms have been developed -- we will give special focus to finding maximal independent sets and generating random objects.

Bio: Ronitt Rubinfeld is a Edwin Sibley Webster Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, and is on the faculty at the University of Tel Aviv. Her research interests include sublinear time algorithms, local algorithms and algorithms for testing discrete distributions over large domains. Ronitt Rubinfeld has been an ONR Young Investigator, a Sloan Fellow, an invited speaker at the Internal Congress of Mathematics (2006) and an ACM Fellow.

##### Current Term

Second term, 2020-2021:

###### 21 January 2021

14:00 - 15:00

Title: Flash Loans for Fun and Profit

Abstract: Credit allows a lender to loan out surplus capital to a borrower. In the traditional economy, credit bears the risk that the borrower may default on its debt, the lender hence requires upfront collateral from the borrower, plus interest fee payments. Due to the atomicity of blockchain transactions, lenders can offer flash loans, i.e., loans that are only valid within one transaction and must be repaid by the end of that transaction. This concept has lead to a number of interesting attack possibilities, some of which were exploited in February 2020. In this talk we will explore the implication of transaction atomicity and flash loans for the nascent decentralized finance (DeFi) ecosystem. We will analyze two existing attacks with ROIs beyond 500k% and formulate finding the attack parameters as an optimization problem over the state of the underlying Ethereum blockchain and the state of the DeFi ecosystem. We will show how malicious adversaries can efficiently maximize an attack profit and hence damage the DeFi ecosystem further. Specifically, we will present how two previously executed attacks can be “boosted” to result in a profit of 829.5k USD and 1.1M USD, respectively, which is a boost of 2.37× and 1.73×, respectively.

Bio: Arthur Gervais is a Lecturer (equivalent Assistant Professor) at Imperial College London, and the founder of Liquidity Network. He's passionate about information security and worked since 2012 on blockchain related topics, with a recent focus on Decentralized Finance (DeFi).
###### 4 February 2021

14:00 - 15:00

Title: Local computation algorithms

Abstract: Consider a setting in which inputs to and outputs from a computational problem are so large, that there is not time to read them in their entirety. However, if one is only interested in small parts of the output at any given time, is it really necessary to solve the entire computational problem? Is it even necessary to view the whole input? We survey recent work in the model of "local computation algorithms" which for a given input, supports queries by a user to values of specified bits of a legal output. The goal is to design local computation algorithms in such a way that very little of the input needs to be seen in order to determine the value of any single bit of the output. Though this model describes sequential computations, techniques from local distributed algorithms have been extremely important in designing efficient local computation algorithms. In this talk, we describe results on a variety of problems for which sublinear time and space local computation algorithms have been developed -- we will give special focus to finding maximal independent sets and generating random objects.

Bio: Ronitt Rubinfeld is a Edwin Sibley Webster Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, and is on the faculty at the University of Tel Aviv. Her research interests include sublinear time algorithms, local algorithms and algorithms for testing discrete distributions over large domains. Ronitt Rubinfeld has been an ONR Young Investigator, a Sloan Fellow, an invited speaker at the Internal Congress of Mathematics (2006) and an ACM Fellow.

14:00 - 15:00

Title: TBA
Abstract: TBA

15:00 - 16:00

Title: TBA
Abstract: TBA

##### Past terms

First term, 2020-2021:

###### 29 October 2020

14:00 - 15:00

Title: A primer on PAC-Bayesian learning, followed by News from the PAC-Bayes frontline

Abstract: PAC-Bayes is a generic and flexible framework to address generalisation abilities of machine learning algorithms. It leverages the power of Bayesian inference and allows to derive new learning strategies. I will briefly present the key concepts of PAC-Bayes and illustrate a few of its recent successes, for deep neural networks, for coherent risk measures, contrastive learning and robust learning.

Bio: Benjamin Guedj is a Principal Research Fellow (~Associate professor) at University College London (Centre for Artificial Intelligence and Department of Computer Science). He is also a tenured research scientist at Inria, the top French research institute in mathematics and computer science, and a visiting researcher with the Alan Turing Institute. Since Sept. 2020, he is the founder and scientific director of The Inria London Programme, a joint lab between France and the UK. He holds a PhD in mathematics from Sorbonne Université (Paris, France) and focuses on statistical learning theory, PAC-Bayes, computational statistics, theory for deep learning, among other topics.

###### 19 November 2020

17:00 - 18:00

Title: Testing noisy linear functions for sparsity

Abstract: This talk explores a question at the intersection of two well-studied topics: (1) sparse recovery, and (2) property testing. Consider the following basic sparse recovery problem: An algorithm is given noisy input-output examples of a linear function over a high-dimensional space. Given the promise that the linear function is sparse, breakthrough work of Candes, Romberg and Tao shows that the linear function can be recovered from a number of examples that scales logarithmically in the dimension (and this dependence is required). We look at this problem from the vantage point of property testing, by studying the complexity of merely determining whether the unknown linear function is sparse versus "far from" sparse. We show that this decision problem can be solved with a number of samples which is completely independent of the dimension if the background high-dimensional distribution of samples has i.i.d. components that are not Gaussian.

Bio: Rocco Servedio is a professor and chair of the Department of Computer Science at Columbia University. His research is in theoretical computer science, and his main research interests lie in computational complexity theory, computational learning theory, property testing, and the role of randomness in computation.

###### 26 November 2020

14:00 - 15:00

Title: Neural Computing, Identity, and Visual Recognition of Persons

Abstract: Brains and computers have long been used as metaphors for each other. In recent years the link has grown stronger, with the explosion of results in machine learning based on big multi-layer networks, and with data from wet neurobiology showing that each cubic millimeter (mm^3) of brain tissue contains about 3 kilometers of "wiring" (neural connections). This talk discusses vision both in machines and in mammals, focusing on pattern encoding mechanisms and their efficiency. The main application areas discussed are face recognition and the IrisCode, which has recently been used to enroll the entire 1.3 billion population of India in a national ID system. Unifying theoretical ideas are entropy, and visual encoders that are optimal under the Uncertainty Principle.

Bio: John Daugman received his degrees at Harvard University and then taught at Harvard before coming to Cambridge University, where he is Professor of Computer Vision and Pattern Recognition. His areas of research include computer vision, information theory, neuro computing and statistical pattern recognition. Awards for his work in science and technology include the Information Technology Award and Medal of the British Computer Society, the "Time 100" Innovators Award, and the OBE, Order of the British Empire. He is the founder and benefactor of the Cambridge Chrysalis Trust.

###### 10 December 2020

14:00 - 15:00

Title: Higher-order interactions in statistical physics, machine learning and biomedicine

Abstract: The problem of inferring pairwise and higher-order interactions in complex systems involving large numbers of interacting variables, from observational data, is fundamental to many fields. Known to the statistical physics community as the inverse problem, it has become accessible in recent years due to real and simulated big data being generated. In the first part of this talk, we discuss extracting interactions form data using a neural network approach, namely the Restricted Boltzmann Machine. In the second part, we discuss a model-independent and unbiased estimator of symmetric interactions for any system of binary and categorical variables, be it magnetic spins, nodes in a neural network, or gene networks in biology. The generality of this technique is demonstrated analytically and numerically in various examples.

Bio: Ava Khamseh received her PhD in Theoretical Particle Physics from the University of Edinburgh in 2017. She then became a Cross-Disciplinary Fellow (XDF) at the Institute of Genetics and Molecular Medicine at the University of Edinburgh. She is currently a Lecturer in Biomedical Artificial Intelligence, joint between the School of Informatics and IGMM. Her research involves designing experiments and developing causal mathematical, statistical and machine learning methods for applications to cancer biology and large-scale population biomedicine.

Third term, 2019-2020:

###### 2 July 2020

14:00 - 15:00

Title: Blockchain Scalability and Privacy via Scalable and Transparent zk-STARKs

Abstract: Two of the biggest challenges of blockchains like Bitcoin and Ethereum are (1) scalability - growing thoughput 10x, 100x, 1000x, ... and (2) maintaining privacy. Scalable ZK proof systems can solve both problems. This talk will recount the theory-to-practice journey of ZK proofs and describes questions in algebraic coding theory whose resolution will help improve scalability and privacy even more.

Bio: Eli is a co-founder and president of StarkWare, and Chairman of its Board of Directors. Eli has been passionate about the theory (“moon math”) and realization of transparent computational integrity since 2001, when he was a post-doctoral researcher at MIT and Harvard University, after completing his PhD in CS at the Hebrew University. Prior to co-founding StarkWare, Eli was a Professor of Computer Science at Technion – Israel Institute of Technology, which he joined in 2005. He is also a co-inventor of the Zerocash decentralized anonymous payment system and a founding scientist of the Electric Coin Company (aka the Zcash company).
###### 18 June 2020

15:00 - 16:00

Title: Are All Features Created Equal?

Abstract: Our machine learning models have attained impressive accuracy on many benchmark tasks. Yet, these models remain remarkably brittle---small perturbations of natural inputs can completely degrade their performance. Why is this the case? In this talk, we show that this brittleness can, in part, be attributed to the fact that our models often make decisions in a very different way than humans do. Viewing neural networks as feature extractors, we study how features extracted by neural networks may diverge from those used by humans, and how adversarially robust models seem to make progress towards bridging this gap.

Bio: Aleksander Madry is a Professor of Computer Science in the MIT EECS Department and a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He received his PhD from MIT in 2011 and, prior to joining the MIT faculty, he spent some time at Microsoft Research New England and on the faculty of EPFL. Aleksander’s research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. His work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and 2018 Presburger Award.

Second term, 2019-2020:

###### 16 January 2020

14:00 - 15:00

MSB2.22

Title: Automated Fact Checking: a natural language processing perspective

Abstract:
Fact checking is the task of verifying a claim against sources such as knowledge bases and text collections. While this task has been of great importance for journalism, it has recently become of interest to the general public as it is one of the weapons against misinformation. In this talk, I will first discuss the task and what should be the expectations from automated methods for it. Following this, I will present our approach for fact checking simple numerical statements which we were able to learn without explicitly labelled data. Then I will describe how we automated part of the manual process of the debunking website emergent.info, which later evolved into the Fake News Challenge with 50 participants. Finally, I will present the Fact Extraction and Verification shared task, which took place in 2018, as well as the recent second edition which conducted an adversarial evaluation of the proposed approaches.

###### 4 February 2020

10:00 - 11:00
Unusual room: R0.21

Title: Reading News with Maps by Exploiting Spatial Synonyms

Abstract:
NewsStand is an example application of a general framework to enable people to search for information using a map query interface, where the information results from monitoring the output of over 10,000 RSS news sources and is available for retrieval within minutes of publication. The advantage of doing so is that a map, coupled with an ability to vary the zoom level at which it is viewed, provides an inherent granularity to the search process that facilitates an approximate search. This distinguishes it from today's prevalent keyword-based conventional search methods that provide a very limited facility for approximate searches and which are realized primarily by permitting a match via use of a subset of the keywords. However, it is often the case that users do not have a firm grasp of which keyword to use, and thus would welcome the search to also take synonyms into account. For queries to spatially-referenced data, the map query interface is a step in this direction as the act of pointing at a location (e.g., by the appropriate positioning of a pointing device) and making the interpretation of the precision of this positioning specification dependent on the zoom level is equivalent to permitting the use of spatial synonyms (i.e., letting spatial proximity play a role rather than only seeking an exact match of a query string). Of course, this is all predicated on the use of a textual specification of locations rather than a geometric one, which means that one must deal with the potential for ambiguity.

###### 6 February 2020
14:00 - 15:00
MSB2.22

Title: Multiparty Privacy and AI Security & Privacy

Abstract:
In this talk, I will provide an overview of two problems I have been contributing to in the past years: i) Multiparty Privacy; ii) AI Security & Privacy. Regarding i), privacy is not just about what an individual user discloses about herself, it also involves what others (e.g. her friends) may disclose about her. Multiparty privacy is concerned with information pertaining to several individuals and the conflicts that arise when the privacy preferences of these individuals differ. In particular, I will go through the research we have been doing on multiparty privacy in a social media context and the future challenges in general in this field. Regarding ii) I will be talking about the work we have been doing at the intersection of HCI and AI on security and privacy in AI-based systems, including privacy threats in AI-based systems, users’ perceptions of security and privacy in AI-based systems, socio-technical approaches to tackle AI-based discrimination against particular type of users (e.g. based on gender/ethnicity), and symbolic AI approaches that can help towards addressing security, privacy and transparency issues in AI-based systems.

###### 27 February 2020
14:00 - 15:00

Title: Modulated Bayesian Optimisation

Abstract:
Bayesian optimization (BO) methods often rely on the assumption that the objective function is well-behaved, but in practice, this is seldom true. Common approaches, which try to model the objective as precisely as possible, often fail to make progress by spending too many evaluations modelling irrelevant details. In this talk, I will introduce a set of surrogate models that aims focuses on the details that are informative for search while ignoring detrimental structure that is challenging to model from a few observations. We demonstrate that surrogate models with appropriate noise distributions can absorb challenging structures in the objective function by treating them as irreducible uncertainty.

###### 12 March 2020
14:00 - 15:00

Title: Testing Discrete Distributions -- a CS Retrospective

Abstract: Making inferences about the distributional properties of data is arguably one of the most fundamental classes of data analysis tasks. The study of such inferences has been a scientific endeavour for a long time, typically, by statistics and information theory. In the last two decades, testing problems for distributions, especially, on discrete domains have been studied also in the theoretical computer science community, often from a slightly different perspective. The primary objective in this line of research is characterising the number of samples needed, in terms of the domain size or some function of the distributions involved, to perform a given inference task reliably. In this talk, I will present an overview of this very active field of research, describing some basic models and problems, landmark results, and future directions.

First term, 2019-2020:

###### 10 October 2019

14:00 - 15:00

Sunil Prabhakar (Purdue University)
Title: Database Fidelity Without Trust

Abstract:
Ensuring the trustworthiness of data retrieved from a database is of utmost importance to users. Databases are typically subjected to updates from multiple clients, often at very high rates. The correctness of data stored in a database is defined by the faithful execution of only valid (authorized) transactions. In this talk we address the question of whether it is necessary to trust a database server in order to trust the data retrieved from it? The lack of trust arises naturally if the database server is owned by a third party, as in the case of cloud computing, or if it is likely that the server may have been compromised, or there is a malicious insider.

14:00 - 15:00

###### Evangelia Kalyvianaki (Cambridge)

Title: THEMIS -- Fairness in Federated Stream Processing under Overload

Abstract:
Federated stream processing systems, which utilise nodes from multiple
independent domains, can be found increasingly in multi-provider cloud
deployments, internet-of-things systems, collaborative sensing
applications and large-scale grid systems. To pool resources from several
sites and take advantage of local processing, submitted queries are split
into query fragments, which are executed collaboratively by different
sites. When supporting many concurrent users, however, queries may exhaust
available processing resources, thus requiring constant load shedding.
Given that individual sites have autonomy over how they allocate query
fragments on their nodes, it is an open challenge how to ensure global
fairness on processing quality experienced by queries in a federated
scenario.

14:00 - 15:00
###### Ata Kaban (Birmingham)

Title: Uncovering structure with randomness for learning in high dimensions

Abstract:
Random projection (RP) is a simple, computationally efficient linear
dimensionality reduction technique with interesting theoretical
properties for learning from data. For instance, it can regularise
ill-conditioned covariance matrices, and it preserves dot products
between vectors and their signs to an extent that depends on their
cosine similarity. It can speed up computations when dealing with high
dimensional data sets, while retaining control of generalisation
performance. Furthermore, in this talk we discuss how RP can also be
used as an analytic tool to better understand why learning from high
dimensional data is not always as difficult as it looks. We present
generalisation bounds where the use of RP highlights how learning can
take advantage of the presence of certain benign geometric structures
in the problem, and yields conditions that reduce
dimensional-dependence of error-guarantees in settings where such
dependence is known to be essential in general.

14:00 - 15:00
###### Samson Abramsky (Oxford)

Title: Non-classicality, quantum resources and quantum advantage (slides)

Abstract:
A key objective in the field of quantum information and computation is to understand the advantage which can be gained in information processing tasks by the use of quantum resources. While a range of examples has been studied, to date a systematic understanding of quantum advantage is lacking.

Our focus here is on quantum resources which take the form of non-local, or more generally \emph{contextual}, correlations. Contextuality is one of the key signatures of non-classicality in quantum mechanics and has been shown to be a necessary ingredient for quantum advantage in a range of information processing tasks. We will describe a notion of simulation between quantum resources, and more generally between resources described in terms of contextual correlations, in the sheaf-theoretic'' framework for contextuality (Abramsky-Brandenburger).
The notion of simulation is expressed as a morphism of empirical models, in a form which allows the behaviour of one set of correlations to be simulated in terms of another using classical processing and shared randomization.

The talk will be essentially self-contained, introducing the various notions we will discuss.

15:00 - 16:00
###### Jakob Nikolas Kather (University Hospital Aachen)

Title: Deep Learning for Precision Oncology

Abstract:
Precision oncology requires molecular and genetic testing of tumour tissue. For many tests, widespread implementation into clinical practise is limited because these biomarkers are costly, require significant expertise and are limited by tissue availability. However, virtually every cancer patient gets a biopsy as part of the diagnostic workup and this tissue is routinely stained with hematoxylin and eosin (HE). We have developed a deep learning-based technology to predict molecular features, prognosis and markers for treatment response directly from routine histology. This talk will summarize the state of the art in deep learning histopathology, demonstrate emerging use cases and discuss the clinical implications of deep learning-based molecular testing of solid tumours.

Third term, 2018-2019:

14:00 - 15:00

###### Sarah Meiklejohn (UCL)
Title: Anonymity in Cryptocurrencies
(This talk will take place in CS1.04)
Abstract: A long line of recent research has demonstrated that existing cryptocurrencies often do not achieve the level of anonymity that users might expect they do, while at the same time another line of research has worked to increase the level of anonymity by adding new features to existing cryptocurrencies or creating entirely new cryptocurrencies. This talk will explore both of these lines of research, demonstrating both de-anonymization attacks and techniques for anonymity that achieve provably secure guarantees.
14:00 - 15:00
###### Edith Elkind (Oxford)

Title: Fair Division of a Graph

Abstract: We consider fair allocation of indivisible items under an additional constraint: there is an undirected graph describing the relationship between the items, and each agent's share must form a connected subgraph of this graph. This framework captures, e.g., fair allocation of land plots, where the graph describes the accessibility relation among the plots. We focus on agents that have additive utilities for the items, and consider several common fair division solution concepts, such as proportionality, envy-freeness and maximin share guarantee.In particular, we prove that for acyclic graphs a maximin share allocation always exists and can be computed efficiently. We also discuss a continuous variant of this problem, where agents need to divide a collection of interconnected "cakes" and each agent's piece has to be connected, or, more generally, have a small number of connected components.

Based on joint work with Sylvain Bouveret, Katarina Cehlarova, Ayumi Igarashi, Dominik Peters, Xiaohui Bei and Warut Suksompong

14:00 - 15:00

###### Andreas Damianou (Amazon, Cambridge)

(This talk will take place in CS1.04)

Title: Probability and uncertainty in Deep Learning

Abstract:
In this talk I motivate deep learning from the perspective of probability and uncertainty. The resulting models can maintain and, crucially, communicate their degree of belief regarding their states (e.g. how certain they are in their predictions) using Bayesian inference. I will also highlight the deep Gaussian process family of approaches, which can be seen as a means of performing probabilistic, Bayesian deep learning directly in the space of functions, rather than in the space of finite parameters. Prior knowledge of deep learning is not essential to follow this talk, in fact I will offer a quick overview in the beginning.

14:00 - 15:00
###### Mario Berta (Imperial College London)

Title: Quantum Technologies for Cryptography

Abstract:
The rise of quantum technologies opens exciting possibilities for performing cryptographic tasks, but it also poses new threats. On the one hand, quantum mechanical devices are the basis of a variety of novel cryptographic protocols like key distribution or secure identification, which offer a significant advantage over standard routines. On the other hand, quantum computers will render some existing cryptographic schemes insecure. The interplay between these two aspects is best understood by analysing the power of adversaries which have access to a quantum computer - quantum adversaries. In my talk, I will discuss methods to analyse the power of quantum adversaries which shine light on both aspects.

11:00 - 12:00

###### Radu Calinescu (University of York)

Title: Learning, synthesis and efficient analysis of probabilistic models

Abstract:
Probabilistic modelling is a powerful tool for establishing performance, dependability and other quantitative properties of systems and processes during design, verification and at runtime. However, the usefulness of this tool depends on the accuracy of the models being analysed, on the efficiency of the analysis, and on the ability to find models corresponding to effective system and process architectures and configurations. This talk will describe how recent approaches to probabilistic model learning, analysis and synthesis address major challenges posed by these prerequisites, significantly advancing the applicability of probabilistic modelling.