cicyt UNIZAR

Statistics

New submissions

[ total of 75 entries: 1-75 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 23 Feb 18

[1]  arXiv:1802.07721 [pdf, other]
Title: Discussion on "Sparse graphs using exchangeable random measures" by Francois Caron and Emily B. Fox
Authors: Mingyuan Zhou
Subjects: Methodology (stat.ME)

This is a discussion on "Sparse graphs using exchangeable random measures" by Francois Caron and Emily B. Fox, published in Journal of the Royal Statistical Society, Series B, 2017.

[2]  arXiv:1802.07756 [pdf]
Title: Determining the best classifier for predicting the value of a boolean field on a blood donor database
Authors: Ritabrata Maiti
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Motivation: Thanks to digitization, we often have access to large databases, consisting of various fields of information, ranging from numbers to texts and even boolean values. Such databases lend themselves especially well to machine learning, classification and big data analysis tasks. We are able to train classifiers, using already existing data and use them for predicting the values of a certain field, given that we have information regarding the other fields. Most specifically, in this study, we look at the Electronic Health Records (EHRs) that are compiled by hospitals. These EHRs are convenient means of accessing data of individual patients, but there processing as a whole still remains a task. However, EHRs that are composed of coherent, well-tabulated structures lend themselves quite well to the application to machine language, via the usage of classifiers. In this study, we look at a Blood Transfusion Service Center Data Set (Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan). We used scikit-learn machine learning in python. From Support Vector Machines(SVM), we use Support Vector Classification(SVC), from the linear model we import Perceptron. We also used the K.neighborsclassifier and the decision tree classifiers. We segmented the database into the 2 parts. Using the first, we trained the classifiers and the next part was used to verify if the classifier prediction matched that of the actual values.
Contact: ritabratamaiti@hiretrex.com

[3]  arXiv:1802.07762 [pdf]
Title: Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality
Subjects: Applications (stat.AP)

In environmental epidemiology studies, health response data (e.g. hospitalization or mortality) are often noisy because of hospital organization and other social factors. The noise in the data can hide the true signal related to the exposure. The signal can be unveiled by performing a temporal aggregation on health data and then using it as the response in regression analysis. From aggregated series, a general methodology is introduced to account for the particularities of an aggregated response in a regression setting. This methodology can be used with usually applied regression models in weather-related health studies, such as generalized additive models (GAM) and distributed lag nonlinear models (DLNM). In particular, the residuals are modelled using an autoregressive-moving average (ARMA) model to account for the temporal dependence. The proposed methodology is illustrated by modelling the influence of temperature on cardiovascular mortality in Canada. A comparison with classical DLNMs is provided and several aggregation methods are compared. Results show that there is an increase in the fit quality when the response is aggregated, and that the estimated relationship focuses more on the outcome over several days than the classical DLNM. More precisely, among various investigated aggregation schemes, it was found that an aggregation with an asymmetric Epanechnikov kernel is more suited for studying the temperature-mortality relationship.

[4]  arXiv:1802.07773 [pdf, other]
Title: Counting Motifs with Graph Sampling
Subjects: Statistics Theory (math.ST); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

Applied researchers often construct a network from a random sample of nodes in order to infer properties of the parent network. Two of the most widely used sampling schemes are subgraph sampling, where we sample each vertex independently with probability $p$ and observe the subgraph induced by the sampled vertices, and neighborhood sampling, where we additionally observe the edges between the sampled vertices and their neighbors.
In this paper, we study the problem of estimating the number of motifs as induced subgraphs under both models from a statistical perspective. We show that: for any connected $h$ on $k$ vertices, to estimate $s=\mathsf{s}(h,G)$, the number of copies of $h$ in the parent graph $G$ of maximum degree $d$, with a multiplicative error of $\epsilon$, (a) For subgraph sampling, the optimal sampling ratio $p$ is $\Theta_{k}(\max\{ (s\epsilon^2)^{-\frac{1}{k}}, \; \frac{d^{k-1}}{s\epsilon^{2}} \})$, achieved by Horvitz-Thompson type of estimators. (b) For neighborhood sampling, we propose a family of estimators, encompassing and outperforming the Horvitz-Thompson estimator and achieving the sampling ratio $O_{k}(\min\{ (\frac{d}{s\epsilon^2})^{\frac{1}{k-1}}, \; \sqrt{\frac{d^{k-2}}{s\epsilon^2}}\})$. This is shown to be optimal for all motifs with at most $4$ vertices and cliques of all sizes.
The matching minimax lower bounds are established using certain algebraic properties of subgraph counts. These results quantify how much more informative neighborhood sampling is than subgraph sampling, as empirically verified by experiments on both synthetic and real-world data. We also address the issue of adaptation to the unknown maximum degree, and study specific problems for parent graphs with additional structures, e.g., trees or planar graphs.

[5]  arXiv:1802.07807 [pdf, ps, other]
Title: A Guide to Comparing the Performance of VA Algorithms
Authors: Samuel J. Clark
Subjects: Applications (stat.AP)

The literature comparing the performance of algorithms for assigning cause of death using verbal autopsy data is fractious and does not reach a consensus on which algorithms perform best, or even how to do the comparison. This manuscript explains the challenges and suggests a way forward. A universal challenge is the lack of standard training and testing data. This limits meaningful comparisons between algorithms, and further, limits the ability of any algorithm to classify verbal autopsy deaths by cause in a way that is widely generalizable across regions and through time. Verbal autopsy algorithms utilize a variety of information to describe the relationship between verbal autopsy symptoms and causes of death - called symptom-cause information (SCI). A crowd sourced, public archive of SCI managed by the World Health Organization (WHO) is suggested as a way to address the lack of SCI for developing, testing, and comparing verbal autopsy coding algorithms, and additionally, as a way to ensure that algorithm-assigned causes of death are as accurate and comparable across regions and through time as possible.

[6]  arXiv:1802.07838 [pdf, other]
Title: Mutual Assent or Unilateral Nomination? A Performance Comparison of Intersection and Union Rules for Integrating Self-reports of Social Relationships
Subjects: Methodology (stat.ME)

Data collection designs for social network studies frequently involve asking both parties to a potential relationship to report on the presence of absence of that relationship, resulting in two measurements per potential tie. When inferring the underlying network, is it better to estimate the tie as present only when both parties report it as present or do so when either reports it? Employing several data sets in which network structure can be well-determined from large numbers of informant reports, we examine the performance of these two simple rules. Our analysis shows better results for mutual assent across all data sets examined. A theoretical analysis of estimator performance shows that the best rule depends on both underlying error rates and the sparsity of the underlying network, with sparsity driving the superiority of mutual assent in typical social network settings.

[7]  arXiv:1802.07927 [pdf, other]
Title: The Hidden Vulnerability of Distributed Learning in Byzantium
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine-resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers.
We show in this paper that convergence is not enough. In high dimension $d \gg 1$, an adver\-sary can build on the loss function's non--convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine--resilient schemes leave a margin of poisoning of $\Omega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt[p]{d~}$. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR--10 and MNIST.
We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow $O( \frac{1}{\sqrt{d~}})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non--Byzantine gradients had been used to update the model.

[8]  arXiv:1802.07928 [pdf, other]
Title: Asynchronous Byzantine Machine Learning
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce \emph{Kardam}, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against $\frac{1}{3}$ Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic gradient weighting scheme. We prove that Kardam guarantees almost sure convergence in the presence of asynchrony and Byzantine behavior, and we derive its convergence rate. We evaluate Kardam on the CIFAR-100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions. We empirically show that Kardam does not introduce additional noise to the learning procedure but does induce a slowdown (the cost of Byzantine resilience) that we both theoretically and empirically show to be less than $f/n$, where $f$ is the number of Byzantine failures tolerated and $n$ the total number of workers. Interestingly, we also empirically observe that the dampening component is interesting in its own right for it enables to build an SGD algorithm that outperforms alternative staleness-aware asynchronous competitors in environments with honest workers.

[9]  arXiv:1802.07954 [pdf, other]
Title: The State of the Art in Integrating Machine Learning into Visual Analytics
Journal-ref: Computer Graphics Forum, Wiley, 2017, 36 (8), pp.458 - 486. \&\#x3008;10.1111/cgf.13092\&\#x3009
Subjects: Machine Learning (stat.ML); Human-Computer Interaction (cs.HC); Learning (cs.LG)

Visual analytics systems combine machine learning or other analytic techniques with interactive data visualization to promote sensemaking and analytical reasoning. It is through such techniques that people can make sense of large, complex data. While progress has been made, the tactful combination of machine learning and data visualization is still under-explored. This state-of-the-art report presents a summary of the progress that has been made by highlighting and synthesizing select research advances. Further, it presents opportunities and challenges to enhance the synergy between machine learning and visual analytics for impactful future research directions.

[10]  arXiv:1802.07998 [pdf, other]
Title: Robust estimators in a generalized partly linear regression model under monotony constraints
Subjects: Statistics Theory (math.ST)

In this paper, we consider the situation in which the observations follow an isotonic generalized partly linear model. Under this model, the mean of the responses is modelled, through a link function, linearly on some covariates and nonparametrically on an univariate regressor in such a way that the nonparametric component is assumed to be a monotone function. A class of robust estimates for the monotone nonparametric component and for the regression parameter, related to the linear one, is defined. The robust estimators are based on a spline approach combined with a score function which bounds large values of the deviance. As an application, we consider the isotonic partly linear log--Gamma regression model. Through a Monte Carlo study, we investigate the performance of the proposed estimators under a partly linear log--Gamma regression model with increasing nonparametric component.

[11]  arXiv:1802.08004 [pdf, ps, other]
Title: The use of sampling weights in the M-quantile random-effects regression: an application to PISA mathematics scores
Subjects: Statistics Theory (math.ST)

M-quantile random-effects regression represents an interesting approach for modelling multilevel data when the interest of researchers is focused on the conditional quantiles. When data are based on complex survey designs, sampling weights have to be incorporate in the analysis. A pseudo-likelihood approach for accommodating sampling weights in the M-quantile random-effects regression is presented. The proposed methodology is applied to the Italian sample of the "Program for International Student Assessment 2015" survey in order to study the gender gap in mathematics at various quantiles of the conditional distribution. Findings offer a possible explanation of the low share of females in "Science, Technology, Engineering and Mathematics" sectors.

[12]  arXiv:1802.08012 [pdf, other]
Title: Learning Topic Models by Neighborhood Aggregation
Authors: Ryohei Hisano
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Topic models are one of the most frequently used models in machine learning due to its high interpretability and modular structure. However extending the model to include supervisory signal, incorporate pre-trained word embedding vectors and add nonlinear output function to the model is not an easy task because one has to resort to highly intricate approximate inference procedure. In this paper, we show that topic models could be viewed as performing a neighborhood aggregation algorithm where the messages are passed through a network defined over words. Under the network view of topic models, nodes corresponds to words in a document and edges correspond to either a relationship describing co-occurring words in a document or a relationship describing same word in the corpus. The network view allows us to extend the model to include supervisory signals, incorporate pre-trained word embedding vectors and add nonlinear output function to the model in a simple manner. Moreover, we describe a simple way to train the model that is well suited in a semi-supervised setting where we only have supervisory signals for some portion of the corpus and the goal is to improve prediction performance in the held-out data. Through careful experiments we show that our approach outperforms state-of-the-art supervised Latent Dirichlet Allocation implementation in both held-out document classification tasks and topic coherence.

[13]  arXiv:1802.08114 [pdf, other]
Title: Sparse Bayesian dynamic network models, with genomics applications
Subjects: Methodology (stat.ME)

Network models have become an important topic in modern statistics, and the evolution of network structure over time is an important new area of study, relevant to a range of applications. An important application of statistical network modelling is in genomics: network models are a natural way to describe and analyse patterns of interactions between genes and their products. However, whilst network models are well established in genomics, historically these models have mostly been static network models, ignoring the dynamic nature of genomic processes.
In this work, we propose a model to infer dynamic genomic network structure, based on single-cell measurements of gene-expression counts. Our model draws on ideas from the Bayesian lasso and from copula modelling, and is implemented efficiently by combining Gibbs- and slice-sampling techniques. We apply the modelling to data from neural development, and infer changes in network structure which match current biological knowledge, as well as discovering novel network structures which identify potential targets for further experimental investigation by neuro-biologists.

[14]  arXiv:1802.08139 [pdf, other]
Title: Path-Specific Counterfactual Fairness
Subjects: Machine Learning (stat.ML)

We consider the problem of learning fair decision systems in complex scenarios in which a sensitive attribute might affect the decision along both fair and unfair pathways. We introduce a causal approach to disregard effects along unfair pathways that simplifies and generalizes previous literature. Our method corrects observations adversely affected by the sensitive attribute, and uses these to form a decision. This avoids disregarding fair information, and does not require an often intractable computation of the path-specific effect. We leverage recent developments in deep learning and approximate inference to achieve a solution that is widely applicable to complex, non-linear scenarios.

[15]  arXiv:1802.08161 [pdf, other]
Title: Consistency of the maximum likelihood estimator in seasonal hidden Markov models
Authors: Augustin Touron (UP11, EDF R&D)
Comments: arXiv admin note: text overlap with arXiv:1710.08112
Subjects: Applications (stat.AP); Methodology (stat.ME)

In this paper, we introduce a variant of hidden Markov models in which the transition probabilities between the states, as well as the emission distributions, are not constant in time but vary in a periodic manner. This class of models, that we will call seasonal hidden Markov models (SHMM) is particularly useful in practice, as many applications involve a seasonal behaviour. However, up to now, there is no theoretical result regarding this kind of model. We show that under mild assumptions, SHMM are identifiable: we can identify the transition matrices and the emission distributions from the joint distribution of the observations on a period, up to state labelling. We also give sufficient conditions for the strong consistency of the maximum likelihood estimator (MLE). These results are applied to simulated data, using the EM algorithm to compute the MLE. Finally, we show how SHMM can be used in real world applications by applying our model to precipitation data, with mixtures of exponential distributions as emission distributions.

[16]  arXiv:1802.08163 [pdf, other]
Title: An Analysis of Categorical Distributional Reinforcement Learning
Subjects: Machine Learning (stat.ML)

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cram\'er distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.

[17]  arXiv:1802.08167 [pdf, other]
Title: Learning Causally-Generated Stationary Time Series
Comments: 13 pages, 7 figures, 2 tables, includes appendices
Subjects: Machine Learning (stat.ML)

We present the Causal Gaussian Process Convolution Model (CGPCM), a doubly nonparametric model for causal, spectrally complex dynamical phenomena. The CGPCM is a generative model in which white noise is passed through a causal, nonparametric-window moving-average filter, a construction that we show to be equivalent to a Gaussian process with a nonparametric kernel that is biased towards causally-generated signals. We develop enhanced variational inference and learning schemes for the CGPCM and its previous acausal variant, the GPCM (Tobar et al., 2015b), that significantly improve statistical accuracy. These modelling and inferential contributions are demonstrated on a range of synthetic and real-world signals.

[18]  arXiv:1802.08175 [pdf, other]
Title: Algebra and geometry of tensors for modeling rater agreement data
Comments: 24 pages, 8 figures
Subjects: Statistics Theory (math.ST)

We study three different quasi-symmetry models and three different mixture models of $n\times n\times n$ tensors for modeling rater agreement data. For these models we give a geometric description of the associated varieties and we study their invariants distinguishing between the case $n=2$ and the case $n>2$. Finally, for the two models for pairwise agreement we state some results about the pairwise Cohen's $\kappa$ coefficients.

[19]  arXiv:1802.08178 [pdf, other]
Title: Correlation-Adjusted Survival Scores for High-Dimensional Variable Selection
Subjects: Methodology (stat.ME)

Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and non-influential markers. Usually, the first step is to perform a univariate screening step that ranks the markers according to their associations with the outcome. It is common to perform screening using Cox scores, which quantify the associations between survival and each of the markers individually. Since Cox scores do not account for dependencies between the markers, their use is suboptimal in the presence highly correlated markers. Methods: As an alternative to the Cox score, we propose the correlation-adjusted regression survival (CARS) score for right-censored survival outcomes. By removing the correlations between the markers, the CARS score quantifies the associations between the outcome and the set of "de-correlated" marker values. Estimation of the scores is based on inverse probability weighting, which is applied to log-transformed event times. For high-dimensional data, estimation is based on shrinkage techniques. Results: The consistency of the CARS score is proven under mild regularity conditions. In simulations, survival models based on CARS score rankings achieved higher areas under the precision-recall curve than competing methods. Two example applications on prostate and breast cancer confirmed these results. CARS scores are implemented in the R package carSurv. Conclusions: In research applications involving high-dimensional genetic data, the use of CARS scores for marker selection is a favorable alternative to Cox scores even when correlations between covariates are low. Having a straightforward interpretation and low computational requirements, CARS scores are an easy-to-use screening tool in personalized medicine research.

[20]  arXiv:1802.08179 [pdf, other]
Title: Elements of the Kopula (eventological copula) theory
Comments: PDFLaTeX, 45 pages, 25 figures
Journal-ref: Journal-ref: Proc. of the XIV Intern. FAMEMS Conf. on Financial and Actuarial Math and Eventology of Multivariate Statistics; Krasnoyarsk, SFU (Oleg Vorobyev ed.), (2015) 78-122
Subjects: Other Statistics (stat.OT)

New in the probability theory and eventology theory, the concept of Kopula (eventological copula) is introduced. The theorem on the characterization of the sets of events by Kopula is proved, which serves as the eventological pre-image of the well-known Sclar's theorem on copulas (1959). The Kopulas of doublets and triplets of events are given, as well as of some N-sets of events.

[21]  arXiv:1802.08183 [pdf, other]
Title: Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Learning (cs.LG)

Online optimization has been a successful framework for solving large-scale problems under computational constraints and partial information. Current methods for online convex optimization require either a projection or exact gradient computation at each step, both of which can be prohibitively expensive for large-scale applications. At the same time, there is a growing trend of non-convex optimization in machine learning community and a need for online methods. Continuous submodular functions, which exhibit a natural diminishing returns condition, have recently been proposed as a broad class of non-convex functions which may be efficiently optimized. Although online methods have been introduced, they suffer from similar problems. In this work, we propose Meta-Frank-Wolfe, the first online projectionfree algorithm that uses stochastic gradient estimates. The algorithm relies on a careful sampling of gradients in each round and achieves the optimal $O(\sqrt{T})$ adversarial regret bounds for convex and continuous submodular optimization. We also propose One-Shot Frank-Wolfe, a simpler algorithm which requires only a single stochastic gradient estimate in each round and achieves a $O(T^{2/3})$ stochastic regret bound for convex and continuous submodular optimization. We apply our methods to develop a novel "lifting" framework for the online discrete submodular maximization and also see that they outperform current state of the art techniques on an extensive set of experiments.

[22]  arXiv:1802.08229 [pdf, ps, other]
Title: A Better (Bayesian) Interval Estimate for Within-Subject Designs
Subjects: Methodology (stat.ME)

We develop a Bayesian highest-density interval (HDI) for use in within-subject designs. This credible interval is based on a standard noninformative prior and a modified posterior distribution that conditions on both the data and point estimates of the subject-specific random effects. Conditioning on the estimated random effects removes between-subject variance and produces intervals that are the Bayesian analogue of the within-subject confidence interval proposed in Loftus and Masson (1994). We show that the latter interval can also be derived as a Bayesian within-subject HDI under a certain improper prior. We argue that the proposed new interval is superior to the original within-subject confidence interval, on the grounds of (a) it being based on a more sensible prior, (b) it having a clear and intuitively appealing interpretation, and (c) because its length is always smaller. A generalization of the new interval that can be applied to heteroscedastic data is also derived, and we show that the resulting interval is numerically equivalent to the normalization method discussed in Franz and Loftus (2012); however, our work provides a Bayesian formulation for the normalization method, and in doing so we identify the associated prior distribution.

[23]  arXiv:1802.08238 [pdf]
Title: What are the most important factors that influence the changes in London Real Estate Prices? How to quantify them?
Authors: Yiyang Gu
Subjects: Applications (stat.AP); General Finance (q-fin.GN)

In recent years, real estate industry has captured government and public attention around the world. The factors influencing the prices of real estate are diversified and complex. However, due to the limitations and one-sidedness of their respective views, they did not provide enough theoretical basis for the fluctuation of house price and its influential factors. The purpose of this paper is to build a housing price model to make the scientific and objective analysis of London's real estate market trends from the year 1996 to 2016 and proposes some countermeasures to reasonably control house prices. Specifically, the paper analyzes eight factors which affect the house prices from two aspects: housing supply and demand and find out the factor which is of vital importance to the increase of housing price per square meter. The problem of a high level of multicollinearity between them is solved by using principal components analysis.

[24]  arXiv:1802.08242 [pdf, other]
Title: Structured low-rank matrix completion for forecasting in time series analysis
Comments: 25 pages, 12 figures
Subjects: Methodology (stat.ME); Systems and Control (cs.SY); Numerical Analysis (math.NA); Machine Learning (stat.ML)

In this paper we consider the low-rank matrix completion problem with specific application to forecasting in time series analysis. Briefly, the low-rank matrix completion problem is the problem of imputing missing values of a matrix under a rank constraint. We consider a matrix completion problem for Hankel matrices and a convex relaxation based on the nuclear norm. Based on new theoretical results and a number of numerical and real examples, we investigate the cases when the proposed approach can work. Our results highlight the importance of choosing a proper weighting scheme for the known observations.

[25]  arXiv:1802.08246 [pdf, other]
Title: Characterizing Implicit Bias in Terms of Optimization Geometry
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We study the bias of generic optimization methods, including Mirror Descent, Natural Gradient Descent and Steepest Descent with respect to different potentials and norms, when optimizing underdetermined linear regression or separable linear classification problems. We ask the question of whether the global minimum (among the many possible global minima) reached by optimization algorithms can be characterized in terms of the potential or norm, and independently of hyperparameter choices such as step size and momentum.

Cross-lists for Fri, 23 Feb 18

[26]  arXiv:1802.07444 (cross-list from cs.LG) [pdf, other]
Title: Scaling-up Split-Merge MCMC with Locality Sensitive Sampling (LSS)
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Methodology (stat.ME); Machine Learning (stat.ML)

Split-Merge MCMC (Monte Carlo Markov Chain) is one of the essential and popular variants of MCMC for problems when an MCMC state consists of an unknown number of components or clusters. It is well known that state-of-the-art methods for split-merge MCMC do not scale well. Strategies for rapid mixing requires smart and informative proposals to reduce the rejection rate. However, all known smart proposals involve cost at least linear in the size of the data $ \ge O(N)$, to suggest informative transitions. Thus, the cost of each iteration is prohibitive for massive scale datasets. It is further known that uninformative but computationally efficient proposals, such as random split-merge, leads to extremely slow convergence. This tradeoff between mixing time and per update cost seems hard to get around. In this paper, we get around this tradeoff by utilizing simple similarity information, such as cosine similarity, between the entity vectors to design a proposal distribution. Such information is readily available in almost all applications. We show that the recent use of locality sensitive hashing for efficient adaptive sampling can be leveraged to obtain a computationally efficient pseudo-marginal MCMC. The new split-merge MCMC has constant time update, just like random split-merge, and at the same time the proposal is informative and needs significantly fewer iterations than random split-merge. Overall, we obtain a sweet tradeoff between convergence and per update cost. As a direct consequence, our proposal, named LSHSM, is around 10x faster than the state-of-the-art sampling methods on both synthetic datasets and two large real datasets KDDCUP and PubMed with several millions of entities and thousands of cluster centers.

[27]  arXiv:1802.07796 (cross-list from cs.CV) [pdf, other]
Title: Continuous Relaxation of MAP Inference: A Nonconvex Perspective
Comments: Accepted for publication at the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we study a nonconvex continuous relaxation of MAP inference in discrete Markov random fields (MRFs). We show that for arbitrary MRFs, this relaxation is tight, and a discrete stationary point of it can be easily reached by a simple block coordinate descent algorithm. In addition, we study the resolution of this relaxation using popular gradient methods, and further propose a more effective solution using a multilinear decomposition framework based on the alternating direction method of multipliers (ADMM). Experiments on many real-world problems demonstrate that the proposed ADMM significantly outperforms other nonconvex relaxation based methods, and compares favorably with state of the art MRF optimization algorithms in different settings.

[28]  arXiv:1802.07814 (cross-list from cs.LG) [pdf, other]
Title: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.

[29]  arXiv:1802.07833 (cross-list from cs.LG) [pdf, ps, other]
Title: Variational Inference for Policy Gradient
Authors: Tianbing Xu
Comments: 7 pages
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems.

[30]  arXiv:1802.07834 (cross-list from q-bio.PE) [pdf, other]
Title: Learning to Gather without Communication
Comments: Preliminary version, presented at the 5th Biological Distributed Algorithms Workshop. Washington D.C, July 28th, 2017
Subjects: Populations and Evolution (q-bio.PE); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

A standard belief on emerging collective behavior is that it emerges from simple individual rules. Most of the mathematical research on such collective behavior starts from imperative individual rules, like always go to the center. But how could an (optimal) individual rule emerge during a short period within the group lifetime, especially if communication is not available. We argue that such rules can actually emerge in a group in a short span of time via collective (multi-agent) reinforcement learning, i.e learning via rewards and punishments. We consider the gathering problem: several agents (social animals, swarming robots...) must gather around a same position, which is not determined in advance. They must do so without communication on their planned decision, just by looking at the position of other agents. We present the first experimental evidence that a gathering behavior can be learned without communication in a partially observable environment. The learned behavior has the same properties as a self-stabilizing distributed algorithm, as processes can gather from any initial state (and thus tolerate any transient failure). Besides, we show that it is possible to tolerate the brutal loss of up to 90\% of agents without significant impact on the behavior.

[31]  arXiv:1802.07877 (cross-list from cs.LG) [pdf, other]
Title: Pooling homogeneous ensembles to build heterogeneous ensembles
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In ensemble methods, the outputs of a collection of diverse classifiers are combined in the expectation that the global prediction be more accurate than the individual ones. Heterogeneous ensembles consist of predictors of different types, which are likely to have different biases. If these biases are complementary, the combination of their decisions is beneficial. In this work, a family of heterogeneous ensembles is built by pooling classifiers from M homogeneous ensembles of different types of size T. Depending on the fraction of base classifiers of each type, a particular heterogeneous combination in this family is represented by a point in a regular simplex in M dimensions. The M vertices of this simplex represent the different homogeneous ensembles. A displacement away from one of these vertices effects a smooth transformation of the corresponding homogeneous ensemble into a heterogeneous one. The optimal composition of such heterogeneous ensemble can be determined using cross-validation or, if bootstrap samples are used to build the individual classifiers, out-of-bag data. An empirical analysis of such combinations of bootstraped ensembles composed of neural networks, SVMs, and random trees (i.e. from a standard random forest) illustrates the gains that can be achieved by this heterogeneous ensemble creation method.

[32]  arXiv:1802.07889 (cross-list from cs.LG) [pdf, ps, other]
Title: Entropy Rate Estimation for Markov Chains with Large State Space
Subjects: Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Estimating the entropy based on data is one of the prototypical problems in distribution property testing and estimation. For estimating the Shannon entropy of a distribution on $S$ elements with independent samples, [Paninski2004] showed that the sample complexity is sublinear in $S$, and [Valiant--Valiant2011] showed that consistent estimation of Shannon entropy is possible if and only if the sample size $n$ far exceeds $\frac{S}{\log S}$. In this paper we consider the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations. We show that:
(1) As long as the Markov chain mixes not too slowly, i.e., the relaxation time is at most $O(\frac{S}{\ln^3 S})$, consistent estimation is achievable when $n \gg \frac{S^2}{\log S}$.
(2) As long as the Markov chain has some slight dependency, i.e., the relaxation time is at least $1+\Omega(\frac{\ln^2 S}{\sqrt{S}})$, consistent estimation is impossible when $n \lesssim \frac{S^2}{\log S}$.
Under both assumptions, the optimal estimation accuracy is shown to be $\Theta(\frac{S^2}{n \log S})$. In comparison, the empirical entropy rate requires at least $\Omega(S^2)$ samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.

[33]  arXiv:1802.07917 (cross-list from cs.LG) [pdf, ps, other]
Title: Regional Multi-Armed Bandits
Comments: AISTATS 2018
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when the player selects an arm at each time slot, information of other arms in the same group is also revealed. This regional bandit model naturally bridges the non-informative bandit setting where the player can only learn the chosen arm, and the global bandit model where sampling one arms reveals information of all arms. We propose an efficient algorithm, UCB-g, that solves the regional bandit problem by combining the Upper Confidence Bound (UCB) and greedy principles. Both parameter-dependent and parameter-free regret upper bounds are derived. We also establish a matching lower bound, which proves the order-optimality of UCB-g. Moreover, we propose SW-UCB-g, which is an extension of UCB-g for a non-stationary environment where the parameters slowly vary over time.

[34]  arXiv:1802.07935 (cross-list from math.OC) [pdf, ps, other]
Title: Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning
Subjects: Optimization and Control (math.OC); Dynamical Systems (math.DS); Machine Learning (stat.ML)

Asynchronous stochastic approximations are an important class of model-free algorithms that are readily applicable to multi-agent reinforcement learning (RL) and distributed control applications. When the system size is large, the aforementioned algorithms are used in conjunction with function approximations. In this paper, we present a complete analysis, including stability (almost sure boundedness) and convergence, of asynchronous stochastic approximations with asymptotically bounded biased errors, under easily verifiable sufficient conditions. As an application, we analyze the Policy Gradient algorithms and the more general Value Iteration based algorithms with noise. These are popular reinforcement learning algorithms due to their simplicity and effectiveness. Specifically, we analyze the asynchronous approximate counterpart of policy gradient (A2PG) and value iteration (A2VI) schemes. It is shown that the stability of these algorithms remains unaffected when the approximation errors are guaranteed to be asymptotically bounded, although possibly biased. Regarding convergence of A2VI, it is shown to converge to a fixed point of the perturbed Bellman operator when balanced step-sizes are used. Further, a relationship between these fixed points and the approximation errors is established. A similar analysis for A2PG is also presented.

[35]  arXiv:1802.07971 (cross-list from cs.LG) [pdf, other]
Title: Robustness of classifiers to uniform $\ell\_p$ and Gaussian noise
Journal-ref: 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Apr 2018, Playa Blanca, Spain. 2018, \&\#x3008;http://www.aistats.org/\&\#x3009
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We study the robustness of classifiers to various kinds of random noise models. In particular, we consider noise drawn uniformly from the $\ell\_p$ ball for $p \in [1, \infty]$ and Gaussian noise with an arbitrary covariance matrix. We characterize this robustness to random noise in terms of the distance to the decision boundary of the classifier. This analysis applies to linear classifiers as well as classifiers with locally approximately flat decision boundaries, a condition which is satisfied by state-of-the-art deep neural networks. The predicted robustness is verified experimentally.

[36]  arXiv:1802.07995 (cross-list from math.PR) [pdf, ps, other]
Title: Multidimensional multiscale scanning in Exponential Families: Limit theory and statistical consequences
Subjects: Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)

In this paper we consider the problem of finding anomalies in a $d$-dimensional field of independent random variables $\{Y_i\}_{i \in \left\{1,...,n\right\}^d}$, each distributed according to a one-dimensional natural exponential family $\mathcal F = \left\{F_\theta\right\}_{\theta \in\Theta}$. Given some baseline parameter $\theta_0 \in\Theta$, the field is scanned using local likelihood ratio tests to detect from a (large) given system of regions $\mathcal{R}$ those regions $R \subset \left\{1,...,n\right\}^d$ with $\theta_i \neq \theta_0$ for some $i \in R$. We provide a unified methodology which controls the overall family wise error (FWER) to make a wrong detection at a given error rate.
Fundamental to our method is a Gaussian approximation of the asymptotic distribution of the underlying multiscale scanning test statistic with explicit rate of convergence. From this, we obtain a weak limit theorem which can be seen as a generalized weak invariance principle to non identically distributed data and is of independent interest. Furthermore, we give an asymptotic expansion of the procedures power, which yields minimax optimality in case of Gaussian observations.

[37]  arXiv:1802.08009 (cross-list from cs.LG) [pdf, ps, other]
Title: Iterate averaging as regularization for stochastic gradient descent
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.

[38]  arXiv:1802.08013 (cross-list from cs.AI) [pdf, other]
Title: Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks
Comments: Preprint submitted to Neural Networks
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)

Autonomous robots need to interact with unknown, unstructured and changing environments, constantly facing novel challenges. Therefore, continuous online adaptation for lifelong-learning and the need of sample-efficient mechanisms to adapt to changes in the environment, the constraints, the tasks, or the robot itself are crucial. In this work, we propose a novel framework for probabilistic online motion planning with online adaptation based on a bio-inspired stochastic recurrent neural network. By using learning signals which mimic the intrinsic motivation signal cognitive dissonance in addition with a mental replay strategy to intensify experiences, the stochastic recurrent network can learn from few physical interactions and adapts to novel environments in seconds. We evaluate our online planning and adaptation framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is shown by learning unknown workspace constraints sample-efficiently from few physical interactions while following given via points.

[39]  arXiv:1802.08021 (cross-list from cs.DC) [pdf, other]
Title: SparCML: High-Performance Sparse Communication for Machine Learning
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

One of the main drivers behind the rapid recent advances in machine learning has been the availability of efficient system support. This comes both through hardware acceleration, but also in the form of efficient software frameworks and programming models. Despite significant progress, scaling compute-intensive machine learning workloads to a large number of compute nodes is still a challenging task, with significant latency and bandwidth demands. In this paper, we address this challenge, by proposing SPARCML, a general, scalable communication layer for machine learning applications. SPARCML is built on the observation that many distributed machine learning algorithms either have naturally sparse communication patters, or have updates which can be sparsified in a structured way for improved performance, without any convergence or accuracy loss. To exploit this insight, we design and implement a set of communication efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute sparse input data vectors, of heterogeneous sizes. We call these operations sparse-input collectives, and present efficient practical algorithms with strong theoretical bounds on their running time and communication cost. Our generic communication layer is enriched with additional features, such support for non-blocking (asynchronous) operations, and support for low-precision data representations. We validate our algorithmic results experimentally on a range of large-scale machine learning applications and target architectures, showing that we can leverage sparsity for order- of-magnitude runtime savings, compared to state-of-the art methods and frameworks.

[40]  arXiv:1802.08061 (cross-list from econ.EM) [pdf, ps, other]
Title: Algorithmic Collusion in Cournot Duopoly Market: Evidence from Experimental Economics
Comments: 22 pages, 7 figures; algorithmic collusion; Cournot duopoly model; experimental economics; game theory; collusion algorithm design; iterated prisoner's dilemma; antitrust; mechanism design
Subjects: Econometrics (econ.EM); Computer Science and Game Theory (cs.GT); Applications (stat.AP); Machine Learning (stat.ML)

Algorithmic collusion is an emerging concept in current artificial intelligence age. Whether algorithmic collusion is a creditable threat remains as an argument. In this paper, we propose an algorithm which can extort its human rival to collude in a Cournot duopoly competing market. In experiments, we show that, the algorithm can successfully extorted its human rival and gets higher profit in long run, meanwhile the human rival will fully collude with the algorithm. As a result, the social welfare declines rapidly and stably. Both in theory and in experiment, our work confirms that, algorithmic collusion can be a creditable threat. In application, we hope, the frameworks, the algorithm design as well as the experiment environment illustrated in this work, can be an incubator or a test bed for researchers and policymakers to handle the emerging algorithmic collusion.

[41]  arXiv:1802.08089 (cross-list from math.OC) [pdf, ps, other]
Title: Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem
Authors: Andre Wibisono
Comments: 44 pages
Subjects: Optimization and Control (math.OC); Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)

We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) in discrete time, and consider how to remove or reduce the bias. We point out the difficulty is that the heat flow is exactly solvable, but neither its forward nor backward method is implementable in general, except for Gaussian data. We propose the symmetrized Langevin algorithm (SLA), which should have a smaller bias than ULA, at the price of implementing a proximal gradient step in space. We show SLA is in fact consistent for Gaussian target measure, whereas ULA is not. We also illustrate various algorithms explicitly for Gaussian target measure, including gradient descent, proximal gradient, and Forward-Backward, and show they are all consistent.

[42]  arXiv:1802.08194 (cross-list from physics.soc-ph) [pdf, other]
Title: Seeing the forest for the trees? An investigation of network knowledge
Comments: 10 Figures and Tables
Subjects: Physics and Society (physics.soc-ph); Applications (stat.AP); Other Statistics (stat.OT)

This paper assesses the empirical content of one of the most prevalent assumptions in the economics of networks literature, namely the assumption that decision makers have full knowledge about the networks they interact on. Using network data from 75 villages, we ask 4,554 individuals to assess whether five randomly chosen pairs of households in their village are linked through financial, social, and informational relationships. We find that network knowledge is low and highly localized, declining steeply with the pair's network distance to the respondent. 46% of respondents are not even able to offer a guess about the status of a potential link between a given pair of individuals. Even when willing to offer a guess, respondents can only correctly identify the links 37% of the time. We also find that a one-step increase in the social distance to the pair corresponds to a 10pp increase in the probability of misidentifying the link. We then investigate the theoretical implications of this assumption by showing that the predictions of various models change substantially if agents behave under the more realistic assumption of incomplete knowledge about the network. Taken together, our results suggest that the assumption of full network knowledge (i) may serve as a poor approximation to the real world and (ii) is not innocuous: allowing for incomplete network knowledge may have first-order implications for a range of qualitative and quantitative results in various contexts.

[43]  arXiv:1802.08195 (cross-list from cs.LG) [pdf, other]
Title: Adversarial Examples that Fool both Human and Computer Vision
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we create the first adversarial examples designed to fool humans, by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by modifying models to more closely match the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

[44]  arXiv:1802.08235 (cross-list from cs.LG) [pdf, other]
Title: Vector Field Based Neural Networks
Comments: 6 pages, 5 figures. To appear in the Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

A novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points are interpreted as particles moving along a flow defined by the vector field which intuitively represents the desired movement to enable classification. The architecture moves the data points from their original configuration to anew one following the streamlines of the vector field with the objective of achieving a final configuration where classes are separable. An optimization problem is solved through gradient descent to learn this vector field.

[45]  arXiv:1802.08241 (cross-list from cs.CV) [pdf, other]
Title: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Comments: 24 pages, 13 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The precise underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze how the landscape of the loss functional is different for large batch size training. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Our results on multiple networks show that, when training at large batch sizes, one tends to stop at points in the parameter space with noticeably higher/larger Hessian spectrum, i.e., where the eigenvalues of the Hessian are much larger. We then study how batch size affects robustness of the model in the face of adversarial attacks. All the results show that models trained with large batches are more susceptible to adversarial attacks, as compared to models trained with small batch sizes. Furthermore, we prove a theoretical result which shows that the problem of finding an adversarial perturbation is a saddle-free optimization problem. Finally, we show empirical results that demonstrate that adversarial training leads to areas with smaller Hessian spectrum. We present detailed experiments with five different network architectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets.

Replacements for Fri, 23 Feb 18

[46]  arXiv:1503.05436 (replaced) [pdf, other]
Title: Inference in Additively Separable Models With a High-Dimensional Set of Conditioning Variables
Authors: Damian Kozbur
Subjects: Statistics Theory (math.ST)
[47]  arXiv:1609.04558 (replaced) [pdf, ps, other]
Title: Statistical Inference in a Directed Network Model with Covariates
Comments: 31 pages. Revised
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[48]  arXiv:1702.08431 (replaced) [pdf, other]
Title: Boundary-Seeking Generative Adversarial Networks
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[49]  arXiv:1703.01610 (replaced) [pdf, ps, other]
Title: Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
Authors: Qinshi Wang, Wei Chen
Comments: This is the full version of the paper accepted at NIPS'2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[50]  arXiv:1704.01665 (replaced) [pdf, other]
Title: Learning Combinatorial Optimization Algorithms over Graphs
Comments: NIPS 2017
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[51]  arXiv:1704.02381 (replaced) [pdf, other]
Title: Adaptive estimation of the rank of the coefficient matrix in high dimensional multivariate response regression models
Subjects: Methodology (stat.ME)
[52]  arXiv:1704.07987 (replaced) [pdf, other]
Title: Training L1-Regularized Models with Orthant-Wise Passive Descent Algorithms
Authors: Jianqiao Wangni
Comments: Accepted to The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18). Feb 2018, New Orleans
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[53]  arXiv:1705.04293 (replaced) [pdf, other]
Title: Bayesian Approaches to Distribution Regression
Comments: Final version to be published at AISTATS 2018
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[54]  arXiv:1705.07606 (replaced) [pdf, other]
Title: Guide Actor-Critic for Continuous Control
Comments: ICLR 2018
Subjects: Machine Learning (stat.ML)
[55]  arXiv:1705.10119 (replaced) [pdf, other]
Title: Kernel Implicit Variational Inference
Comments: Published as a conference paper at ICLR 2018
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[56]  arXiv:1706.06491 (replaced) [pdf, other]
Title: Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Comments: Accepted at AISTATS 2018,
Subjects: Systems and Control (cs.SY); Machine Learning (stat.ML)
[57]  arXiv:1706.06878 (replaced) [pdf, ps, other]
Title: An Unsupervised Method for Estimating the Global Horizontal Irradiance from Photovoltaic Power Measurements
Journal-ref: Solar Energy Volume 158, December 2017, Pages 701-710
Subjects: Machine Learning (stat.ML)
[58]  arXiv:1707.00167 (replaced) [pdf, other]
Title: Asymptotic Distribution-Free Change-Point Detection for Multivariate and non-Euclidean Data
Authors: Lynna Chu, Hao Chen
Subjects: Methodology (stat.ME)
[59]  arXiv:1707.06315 (replaced) [pdf, other]
Title: FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference
Subjects: Machine Learning (stat.ML); Databases (cs.DB)
[60]  arXiv:1709.01449 (replaced) [pdf, other]
Title: Visualization in Bayesian workflow
Comments: 17 pages, 11 Figures. Includes supplementary material
Subjects: Methodology (stat.ME); Applications (stat.AP)
[61]  arXiv:1710.04908 (replaced) [pdf, other]
Title: Graph Convolutional Networks for Classification with a Structured Label Space
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[62]  arXiv:1710.07283 (replaced) [pdf, other]
Title: Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning
Comments: This paper supersedes arXiv:1706.08495
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[63]  arXiv:1710.08864 (replaced) [pdf, other]
Title: One pixel attack for fooling deep neural networks
Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[64]  arXiv:1710.10704 (replaced) [pdf, other]
Title: Training Probabilistic Spiking Neural Networks with First-to-spike Decoding
Comments: A shorter version will be published on Proc. IEEE ICASSP 2018
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[65]  arXiv:1711.01796 (replaced) [pdf, ps, other]
Title: Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables
Subjects: Machine Learning (stat.ML)
[66]  arXiv:1711.08824 (replaced) [pdf, ps, other]
Title: The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)
[67]  arXiv:1712.06575 (replaced) [pdf, other]
Title: Combinatorics of chemical reaction systems
Comments: 33+12 pages, 4 figures
Subjects: Mathematical Physics (math-ph); Combinatorics (math.CO); Probability (math.PR); Applications (stat.AP)
[68]  arXiv:1801.04003 (replaced) [pdf, ps, other]
Title: Some techniques in density estimation
Comments: 18 pages; new version includes tight results on mixtures of general Gaussians
Subjects: Statistics Theory (math.ST); Learning (cs.LG)
[69]  arXiv:1801.04695 (replaced) [pdf, other]
Title: Sparsity-based Defense against Adversarial Attacks on Linear Classifiers
Comments: Submitted to IEEE International Symposium on Information Theory (ISIT) 2018. ZM and SG are joint first authors
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)
[70]  arXiv:1802.01141 (replaced) [pdf, other]
Title: Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data
Subjects: Applications (stat.AP)
[71]  arXiv:1802.03569 (replaced) [pdf, other]
Title: Riemannian Manifold Kernel for Persistence Diagrams
Authors: Tam Le, Makoto Yamada
Comments: fixed a misleading typo (p.6)
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Algebraic Topology (math.AT)
[72]  arXiv:1802.04852 (replaced) [pdf, other]
Title: Persistence Codebooks for Topological Data Analysis
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Algebraic Topology (math.AT)
[73]  arXiv:1802.05339 (replaced) [pdf, other]
Title: Two- and Multi-dimensional Curve Fitting using Bayesian Inference
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Instrumentation and Methods for Astrophysics (astro-ph.IM); Statistics Theory (math.ST)
[74]  arXiv:1802.06931 (replaced) [pdf, other]
Title: Empirical Bayes Matrix Factorization
Subjects: Methodology (stat.ME)
[75]  arXiv:1802.07481 (replaced) [pdf, other]
Title: Dual Extrapolation for Faster Lasso Solvers
Subjects: Machine Learning (stat.ML)
[ total of 75 entries: 1-75 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)