Publications

Check out our website with blogposts presenting some of our research:

[HYPERBOLICDEEPLEARNING.COM]

PREPRINTS:


12. Bloom Origami Assays: Practical Group Testing

(Under review)

With: Louis Abraham, Benjamin Coleman, Anshumali Shrivastava, Bernhard Schölkopf, Alexander J Smola

[pdf, software]

Abstract.   We study the problem usually referred to as group testing in the context of COVID-19. Given n samples collected from patients, how should we select and test mixtures of samples to maximize information and minimize the number of tests? Group testing is a well-studied problem with optimal solutions, but recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods. In this new setting, we obtain strong solutions for small values of $n$ using evolutionary strategies. We then develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results, and present a more accurate decoding algorithm. This results in a new practical, multiplex method yielding strong empirical performance without mixing more than a chosen number of patients into the same probe. Finally, we briefly discuss adaptive methods, casting them into the framework of adaptive submodularity.


11. Optimal Transport Graph Neural Networks

(Under review)

With: Octavian-Eugen Ganea, Benson Chen, Regina Barzilay, Tommi Jaakkola

[pdf, github, slides]

Abstract. Current graph neural network (GNN) architectures naively average or sum node embeddings into an aggregated graph representation—potentially losing structural or semantic information. We here introduce OT-GNN that compute graph embed-dings from optimal transport distances between the set of GNN node embeddingsand “prototype” point clouds as free parameters. This allows different prototypes to highlight key facets of different graph subparts. We show that our function class on point clouds satisfies a universal approximation theorem, a fundamental property which was lost by sum aggregation. Nevertheless, empirically the model has a natural tendency to collapse back to the standard aggregation during training.We address this optimization issue by proposing an efficient noise contrastive regularizer, steering the model towards truly exploiting the optimal transport geometry. Our model consistently exhibits better generalization performance on several molecular property prediction tasks, yielding also smoother representations


10. Practical Accelerated Optimization on Riemannian Manifolds

(Under review)

With: Foivos Alimisis, Antonio Orvieto, Aurelien Lucchi

[pdf]

Abstract. We develop a new Riemannian descent algorithm with an accelerated rate of convergence. We focus on functions that are geodesically convex or weakly-quasi-convex, which are weaker function classes compared to prior work that has considered geodesically strongly convex functions. Our proof of convergence relies on a novel estimate sequence which allows to demonstrate the dependency of the convergence rate on the curvature of the manifold. We validate our theoretical results empirically on several optimization problems de-fined on a sphere and on the manifold of positive definite matrices.


9. Computationally Tractable Riemannian Manifolds for Graph Embeddings

(Under review)

With: Calin Cruceru, Octavian-Eugen Ganea

[pdf]

Abstract. We develop a new Riemannian descent algorithm with an accelerated rate of convergence. We focus on functions that are geodesically convex or weakly-quasi-convex, which are weaker function classes compared to prior work that has considered geodesically strongly convex functions. Our proof of convergence relies on a novel estimate sequence which allows to demonstrate the dependency of the convergence rate on the curvature of the manifold. We validate our theoretical results empirically on several optimization problems de-fined on a sphere and on the manifold of positive definite matrices.


PUBLISHED IN CONFERENCE PROCEEDINGS:

8. ICML 2020. Constant Curvature Graph Convolutional Networks

With: Gregor Bachmann, Octavian-Eugen Ganea

[pdf, slides, blogpost]

Abstract. Interest has been rising lately towards methods representing data in non-Euclidean spaces, e.g. hyperbolic or spherical, that provide specific inductive biases useful for certain real-world data properties, e.g. scale-free, hierarchical or cyclical. However, the popular graph neural networks are currently limited in modeling data only via Euclidean geometry and associated vector space operations. Here, we bridge this gap by proposing mathematically grounded generalizations of graph convolutional networks (GCN) to (products of) constant curvature spaces. We do this by i) introducing a unified formalism that can interpolate smoothly between all geometries of constant curvature, ii) leveraging gyro-barycentric coordinates that generalize the classic Euclidean concept of the center of mass. Our class of models smoothly recover their Euclidean counterparts when the curvature goes to zero from either side. Empirically, we outperform Euclidean GCNs in the tasks of node classification and distortion minimization for symbolic data exhibiting non-Euclidean behavior, according to their discrete curvature.


7. AISTATS 2020. A Continuous-times Perspective for Modeling Acceleration in Riemannian Optimization

(Poster)

With: Foivos Alimisis, Antonio Orvieto, Aurelien Lucchi

[pdf]

Abstract. We propose a novel second-order ODE as the continuous-time limit of a Riemannian accelerated gradient-based method on a manifold with curvature bounded from below.This ODE can be seen as a generalization of the second-order ODE derived for Euclidean spaces and can also serve as an analysis tool.We analyze the convergence behavior of this ODE for different types of functions, such as geodesically convex, strongly-convex and weakly-quasi-convex. We demonstrate how such an ODE can be discretized using a semi-implicit and Nesterov-inspired numerical integrator, that empirically yields stable algorithms which are faithful to the continuous-time analysis and exhibit accelerated convergence.


6. ICLR 2020. Mixed-Curvature Variational Autoencoders

(Poster)

With: Ondrej Skopek, Octavian-Eugen Ganea

[pdf, github]

Abstract. It has been shown that using geometric spaces with non-zero curvature instead of plain Euclidean spaces with zero curvature improves performance on a range of Machine Learning tasks for learning representations. Recent work has lever-aged these geometries to learn latent variable models like Variational Autoen-coders (VAEs) in spherical and hyperbolic spaces with constant curvature. While these approaches work well on particular kinds of data that they were designed fore.g. tree-like data for a hyperbolic VAE, there exists no generic approach unifying all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature can be learned. This generalizes the Euclidean VAE to curved latent spaces, as the model essentially reduces to the Euclidean VAE if curvatures of all latent space components go to 0.


5. ICML 2019. Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

(Oral)

With: Octavian-Eugen Ganea, Sylvain Gelly, Aliaksei Severin

[pdf, slides]

Abstract. The softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies. Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of linear-softmax for quantities of practical interest such as cross entropy or mode estimation, a direction that we theoretically and empirically explore here. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. We theoretically investigate the rank increasing capabilities of such monotonic functions. Empirically, our method improves in two different quality metrics over the traditional softmax-linear layer in synthetic and real language model experiments, adding little time or memory overhead, while being comparable to the more computationally expensive mixture of softmaxes.


4. ICLR 2019. Riemannian Adaptive Optimization Methods

(Poster)

With: Octavian-Eugen Ganea

[pdf, pytorch, poster]

Abstract. Several first order stochastic optimization methods commonly used in the Euclidean domain such as stochastic gradient descent (SGD), accelerated gradient descent or variance reduced methods have already been adapted to certain Riemannian settings. However, some of the most popular of these optimization tools – namely Adam, Adagrad and the more recent Amsgrad – remain to be generalized to Riemannian manifolds. We discuss the difficulty of generalizing such adaptive schemes to the most agnostic Riemannian setting, and then provide algorithms and convergence proofs for geodesically convex objectives in the particular case of a product of Riemannian manifolds, in which adaptivity is implemented across manifolds in the cartesian product. Our generalization is tight in the sense that choosing the Euclidean space as Riemannian manifold yields the same algorithms and regret bounds as those that were already known for the standard algorithms. Experimentally, we show faster convergence and to a lower train loss value for Riemannian adaptive methods over their corresponding baselines on the realistic task of embedding the WordNet taxonomy in the Poincare ball.


3. ICLR 2019. Poincare GloVe: Hyperbolic Word Embeddings

(Poster)

With: Alexandru Tifrea, Octavian-Eugen Ganea

[pdf, github, poster, blogpost]

Abstract. Words are not created equal. In fact, they form an aristocratic graph with a latent hierarchical structure that the next generation of unsupervised learned word embeddings should reveal. In this paper, justified by the notion of delta-hyperbolicity or tree-likeliness of a space, we propose to embed words in a Cartesian product of hyperbolic spaces which we theoretically connect to the Gaussian word embeddings and their Fisher geometry. This connection allows us to introduce a novel principled hypernymy score for word embeddings. Moreover, we adapt the well-known Glove algorithm to learn unsupervised word embeddings in this type of Riemannian manifolds. We further explain how to solve the analogy task using the Riemannian parallel transport that generalizes vector arithmetics to this new type of geometry. Empirically, based on extensive experiments, we prove that our embeddings, trained unsupervised, are the first to simultaneously outperform strong and popular baselines on the tasks of similarity, analogy and hypernymy detection. In particular, for word hypernymy, we obtain new state-of-the-art on fully unsupervised WBLESS classification accuracy.


2. NIPS 2018. Hyperbolic Neural Networks

(Spotlight, top 4% of submissions)

With: Octavian-Eugen Ganea, Thomas Hofmann

[pdf, github, pytorch, poster, video, blogpost]

Abstract. Hyperbolic spaces have recently gained momentum in the context of machine learning due to their high capacity and tree-likeliness properties. However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, mostly because of the absence of corresponding hyperbolic neural network layers. This makes it hard to use hyperbolic embeddings in downstream tasks. Here, we bridge this gap in a principled manner by combining the formalism of Möbius gyrovector spaces with the Riemannian geometry of the Poincaré model of hyperbolic spaces. As a result, we derive hyperbolic versions of important deeplearning tools: multinomial logistic regression, feed-forward and recurrent neural networks such as gated recurrent units. This allows to embed sequential data and perform classification in the hyperbolic space. Empirically, we show that, even if hyperbolic optimization tools are limited, hyperbolic sentence embeddings either outperform or are on par with their Euclidean variants on textual entailment and noisy-prefix recognition tasks.


1. ICML 2018. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

(Oral)

With: Octavian-Eugen Ganea, Thomas Hofmann

[pdf, github, poster, slides]

Abstract. Learning graph representations via low-dimensional embeddings that preserve relevant network properties is an important class of problems in machine learning. We here present a novel method to embed directed acyclic graphs. Following prior work, we first advocate for using hyperbolic spaces which provably model tree-like structures better than Euclidean geometry. Second, we view hierarchical relations as partial orders defined using a family of nested geodesically convex cones. We prove that these entailment cones admit an optimal shape with a closed form expression both in the Euclidean and hyperbolic spaces, and they canonically define the embedding learning process. Experiments show significant improvements of our method over strong recent baselines both in terms of representational capacity and generalization.