codemadness.org

       python_feedgen09_jnboehm.com.atom.xml - sfeed_tests - sfeed tests and RSS and Atom files
 (HTM) git clone git://git.codemadness.org/sfeed_tests
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       python_feedgen09_jnboehm.com.atom.xml (142453B)
       ---
            1 <?xml version='1.0' encoding='UTF-8'?>
            2 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><id>http://arxiv.org/</id><title>arxiv parsed</title><updated>2021-09-23T09:06:47.938938+00:00</updated><author><name>Jan Niklas Böhm</name><email>jan-niklas.boehm@uni-tuebingen.de</email></author><link href="http://arxiv.org" rel="alternate"/><link href="https://jnboehm.com" rel="self"/><generator uri="https://lkiesow.github.io/python-feedgen" version="0.9.0">python-feedgen</generator><subtitle>This parses the arxiv feed and filters interesting (to me) articles!</subtitle><entry><id>http://arxiv.org/abs/2109.09705</id><title>Neural forecasting at scale (update)</title><updated>2021-09-23T09:06:49.539266+00:00</updated><author><name>Philippe Chatigny</name></author><author><name>Shengrui Wang Jean-Marc Patenaude</name></author><author><name>Boris N. Oreshkin</name></author><link href="http://arxiv.org/abs/2109.09705" rel="alternate"/><summary>We study the problem of efficiently scaling ensemble-based deep neural
            3 networks for time series (TS) forecasting on a large set of time series.
            4 Current state-of-the-art deep ensemble models have high memory and
            5 computational requirements, hampering their use to forecast millions of TS in
            6 practical scenarios. We propose N-BEATS(P), a global multivariate variant of
            7 the N-BEATS model designed to allow simultaneous training of multiple
            8 univariate TS forecasting models. Our model addresses the practical limitations
            9 of related models, reducing the training time by half and memory requirement by
           10 a factor of 5, while keeping the same level of accuracy. We have performed
           11 multiple experiments detailing the various ways to train our model and have
           12 obtained results that demonstrate its capacity to support zero-shot TS
           13 forecasting, i.e., to train a neural network on a source TS dataset and deploy
           14 it on a different target TS dataset without retraining, which provides an
           15 efficient and reliable solution to forecast at scale even in difficult
           16 forecasting conditions.
           17 </summary></entry><entry><id>http://arxiv.org/abs/2109.02624</id><title>Functional additive regression on shape and form manifolds of planar curves (update)</title><updated>2021-09-23T09:06:49.538917+00:00</updated><author><name>Almond Stöcker</name></author><author><name>Sonja Greven</name></author><link href="http://arxiv.org/abs/2109.02624" rel="alternate"/><summary>Defining shape and form as equivalence classes under translation, rotation
           18 and -- for shapes -- also scale, we extend generalized additive regression to
           19 models for the shape/form of planar curves or landmark configurations. The
           20 model respects the resulting quotient geometry of the response, employing the
           21 squared geodesic distance as loss function and a geodesic response function
           22 mapping the additive predictor to the shape/form space. For fitting the model,
           23 we propose a Riemannian $L_2$-Boosting algorithm well-suited for a potentially
           24 large number of possibly parameter-intensive model terms, which also yiels
           25 automated model selection. We provide novel intuitively interpretable
           26 visualizations for (even non-linear) covariate effects in the shape/form space
           27 via suitable tensor based factorizations. The usefulness of the proposed
           28 framework is illustrated in an analysis of 1) astragalus shapes of wild and
           29 domesticated sheep and 2) cell forms generated in a biophysical model, as well
           30 as 3) in a realistic simulation study with response shapes and forms motivated
           31 from a dataset on bottle outlines.
           32 </summary></entry><entry><id>http://arxiv.org/abs/2107.04136</id><title>Diagonal Nonlinear Transformations Preserve Structure in Covariance and Precision Matrices (update)</title><updated>2021-09-23T09:06:49.538501+00:00</updated><author><name>Rebecca E Morrison</name></author><author><name>Ricardo Baptista</name></author><author><name>Estelle L Basor</name></author><link href="http://arxiv.org/abs/2107.04136" rel="alternate"/><summary>For a multivariate normal distribution, the sparsity of the covariance and
           33 precision matrices encodes complete information about independence and
           34 conditional independence properties. For general distributions, the covariance
           35 and precision matrices reveal correlations and so-called partial correlations
           36 between variables, but these do not, in general, have any correspondence with
           37 respect to independence properties. In this paper, we prove that, for a certain
           38 class of non-Gaussian distributions, these correspondences still hold, exactly
           39 for the covariance and approximately for the precision. The distributions --
           40 sometimes referred to as "nonparanormal" -- are given by diagonal
           41 transformations of multivariate normal random variables. We provide several
           42 analytic and numerical examples illustrating these results.
           43 </summary></entry><entry><id>http://arxiv.org/abs/2106.09370</id><title>A deep generative model for probabilistic energy forecasting in power systems: normalizing flows (update)</title><updated>2021-09-23T09:06:49.538071+00:00</updated><author><name>Jonathan Dumas</name></author><author><name>Antoine Wehenkel Damien Lanaspeze</name></author><author><name>Bertrand Cornélusse</name></author><author><name>Antonio Sutera</name></author><link href="http://arxiv.org/abs/2106.09370" rel="alternate"/><summary>Greater direct electrification of end-use sectors with a higher share of
           44 renewables is one of the pillars to power a carbon-neutral society by 2050.
           45 However, in contrast to conventional power plants, renewable energy is subject
           46 to uncertainty raising challenges for their interaction with power systems.
           47 Scenario-based probabilistic forecasting models have become a vital tool to
           48 equip decision-makers. This paper presents to the power systems forecasting
           49 practitioners a recent deep learning technique, the normalizing flows, to
           50 produce accurate scenario-based probabilistic forecasts that are crucial to
           51 face the new challenges in power systems applications. The strength of this
           52 technique is to directly learn the stochastic multivariate distribution of the
           53 underlying process by maximizing the likelihood. Through comprehensive
           54 empirical evaluations using the open data of the Global Energy Forecasting
           55 Competition 2014, we demonstrate that this methodology is competitive with
           56 other state-of-the-art deep learning generative models: generative adversarial
           57 networks and variational autoencoders. The models producing weather-based wind,
           58 solar power, and load scenarios are properly compared in terms of forecast
           59 value by considering the case study of an energy retailer and quality using
           60 several complementary metrics. The numerical experiments are simple and easily
           61 reproducible. Thus, we hope it will encourage other forecasting practitioners
           62 to test and use normalizing flows in power system applications such as bidding
           63 on electricity markets, scheduling power systems with high renewable energy
           64 sources penetration, energy management of virtual power plan or microgrids, and
           65 unit commitment.
           66 </summary></entry><entry><id>http://arxiv.org/abs/2105.14367</id><title>Deconvolutional Density Network: Modeling Free-Form Conditional Distributions (update)</title><updated>2021-09-23T09:06:49.537668+00:00</updated><author><name>Bing Chen</name></author><author><name>Mazharul Islam</name></author><author><name>Jisuo Gao</name></author><author><name>Lin Wang</name></author><link href="http://arxiv.org/abs/2105.14367" rel="alternate"/><summary>Conditional density estimation (CDE) is the task of estimating the
           67 probability of an event conditioned on some inputs. A neural network (NN) can
           68 be used to compute the output distribution for continuous-domain, but it is
           69 difficult to explicitly approximate a free-form one without knowing the
           70 information of its general form a priori. In order to fit an arbitrary
           71 conditional distribution, discretizing the continuous domain into bins is an
           72 effective strategy, as long as we have sufficiently narrow bins and very large
           73 data. However, collecting enough data is often hard to reach and falls far
           74 short of that ideal in many circumstances, especially in multivariate CDE for
           75 the curse of dimensionality. In this paper, we demonstrate the benefits of
           76 modeling free-form conditional distributions using a deconvolution-based neural
           77 net framework, coping with data deficiency problems in discretization. It has
           78 the advantage of being flexible but also takes advantage of the hierarchical
           79 smoothness offered by the deconvolution layers. We compare our method to a
           80 number of other density-estimation approaches and show that our Deconvolutional
           81 Density Network (DDN) outperforms the competing methods on many univariate and
           82 multivariate tasks.
           83 </summary></entry><entry><id>http://arxiv.org/abs/2102.07767</id><title>Communication-efficient Distributed Cooperative Learning with Compressed Beliefs (update)</title><updated>2021-09-23T09:06:49.537320+00:00</updated><author><name>Mohammad Taha Toghani</name></author><author><name>César A. Uribe</name></author><link href="http://arxiv.org/abs/2102.07767" rel="alternate"/><summary>We study the problem of distributed cooperative learning, where a group of
           84 agents seeks to agree on a set of hypotheses that best describes a sequence of
           85 private observations. In the scenario where the set of hypotheses is large, we
           86 propose a belief update rule where agents share compressed (either sparse or
           87 quantized) beliefs with an arbitrary positive compression rate. Our algorithm
           88 leverages a unified communication rule that enables agents to access
           89 wide-ranging compression operators as black-box modules. We prove the almost
           90 sure asymptotic exponential convergence of beliefs around the set of optimal
           91 hypotheses. Additionally, we show a non-asymptotic, explicit, and linear
           92 concentration rate in probability of the beliefs on the optimal hypothesis set.
           93 We provide numerical experiments to illustrate the communication benefits of
           94 our method. The simulation results show that the number of transmitted bits can
           95 be reduced to 5-10% of the non-compressed method in the studied scenarios.
           96 </summary></entry><entry><id>http://arxiv.org/abs/2012.15059</id><title>Ensembles of Localised Models for Time Series Forecasting (update)</title><updated>2021-09-23T09:06:49.536891+00:00</updated><author><name>Rakshitha Godahewa</name></author><author><name>Kasun Bandara</name></author><author><name>Geoffrey I. Webb</name></author><author><name>Slawek Smyl</name></author><author><name>Christoph Bergmeir</name></author><link href="http://arxiv.org/abs/2012.15059" rel="alternate"/><summary>With large quantities of data typically available nowadays, forecasting
           97 models that are trained across sets of time series, known as Global Forecasting
           98 Models (GFM), are regularly outperforming traditional univariate forecasting
           99 models that work on isolated series. As GFMs usually share the same set of
          100 parameters across all time series, they often have the problem of not being
          101 localised enough to a particular series, especially in situations where
          102 datasets are heterogeneous. We study how ensembling techniques can be used with
          103 generic GFMs and univariate models to solve this issue. Our work systematises
          104 and compares relevant current approaches, namely clustering series and training
          105 separate submodels per cluster, the so-called ensemble of specialists approach,
          106 and building heterogeneous ensembles of global and local models. We fill some
          107 gaps in the existing GFM localisation approaches, in particular by
          108 incorporating varied clustering techniques such as feature-based clustering,
          109 distance-based clustering and random clustering, and generalise them to use
          110 different underlying GFM model types. We then propose a new methodology of
          111 clustered ensembles where we train multiple GFMs on different clusters of
          112 series, obtained by changing the number of clusters and cluster seeds. Using
          113 Feed-forward Neural Networks, Recurrent Neural Networks, and Pooled Regression
          114 models as the underlying GFMs, in our evaluation on eight publicly available
          115 datasets, the proposed models are able to achieve significantly higher accuracy
          116 than baseline GFM models and univariate forecasting methods.
          117 </summary></entry><entry><id>http://arxiv.org/abs/2009.13267</id><title>Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models (update)</title><updated>2021-09-23T09:06:49.536440+00:00</updated><author><name>Sumanta Bhattacharyya</name></author><author><name>Amirmohammad Rooshenas</name></author><author><name>Subhajit Naskar</name></author><author><name>Simeng Sun</name></author><author><name>Mohit Iyyer</name></author><author><name>Andrew McCallum</name></author><link href="http://arxiv.org/abs/2009.13267" rel="alternate"/><summary>The discrepancy between maximum likelihood estimation (MLE) and task measures
          118 such as BLEU score has been studied before for autoregressive neural machine
          119 translation (NMT) and resulted in alternative training algorithms (Ranzato et
          120 al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However,
          121 MLE training remains the de facto approach for autoregressive NMT because of
          122 its computational efficiency and stability. Despite this mismatch between the
          123 training objective and task measure, we notice that the samples drawn from an
          124 MLE-based trained NMT support the desired distribution -- there are samples
          125 with much higher BLEU score comparing to the beam decoding output. To benefit
          126 from this observation, we train an energy-based model to mimic the behavior of
          127 the task measure (i.e., the energy-based model assigns lower energy to samples
          128 with higher BLEU score), which is resulted in a re-ranking algorithm based on
          129 the samples drawn from NMT: energy-based re-ranking (EBR). We use both marginal
          130 energy models (over target sentence) and joint energy models (over both source
          131 and target sentences). Our EBR with the joint energy model consistently
          132 improves the performance of the Transformer-based NMT: +4 BLEU points on
          133 IWSLT'14 German-English, +3.0 BELU points on Sinhala-English, +1.2 BLEU on
          134 WMT'16 English-German tasks.
          135 </summary></entry><entry><id>http://arxiv.org/abs/2005.11079</id><title>Graph Random Neural Network for Semi-Supervised Learning on Graphs (update)</title><updated>2021-09-23T09:06:49.535864+00:00</updated><author><name>Wenzheng Feng</name></author><author><name>Jie Zhang</name></author><author><name>Yuxiao Dong</name></author><author><name>Yu Han</name></author><author><name>Huanbo Luan</name></author><author><name>Qian Xu</name></author><author><name>Qiang Yang</name></author><author><name>Evgeny Kharlamov</name></author><author><name>Jie Tang</name></author><link href="http://arxiv.org/abs/2005.11079" rel="alternate"/><summary>We study the problem of semi-supervised learning on graphs, for which graph
          136 neural networks (GNNs) have been extensively explored. However, most existing
          137 GNNs inherently suffer from the limitations of over-smoothing, non-robustness,
          138 and weak-generalization when labeled nodes are scarce. In this paper, we
          139 propose a simple yet effective framework -- GRAPH RANDOM NEURAL NETWORKS
          140 (GRAND) -- to address these issues. In GRAND, we first design a random
          141 propagation strategy to perform graph data augmentation. Then we leverage
          142 consistency regularization to optimize the prediction consistency of unlabeled
          143 nodes across different data augmentations. Extensive experiments on graph
          144 benchmark datasets suggest that GRAND significantly outperforms
          145 state-of-the-art GNN baselines on semi-supervised node classification. Finally,
          146 we show that GRAND mitigates the issues of over-smoothing and non-robustness,
          147 exhibiting better generalization behavior than existing GNNs. The source code
          148 of GRAND is publicly available at https://github.com/Grand20/grand.
          149 </summary></entry><entry><id>http://arxiv.org/abs/2004.14427</id><title>Whittle index based Q-learning for restless bandits with average reward (update)</title><updated>2021-09-23T09:06:49.535532+00:00</updated><author><name>Konstantin E. Avrachenkov</name></author><author><name>Vivek S. Borkar</name></author><link href="http://arxiv.org/abs/2004.14427" rel="alternate"/><summary>A novel reinforcement learning algorithm is introduced for multiarmed
          150 restless bandits with average reward, using the paradigms of Q-learning and
          151 Whittle index. Specifically, we leverage the structure of the Whittle index
          152 policy to reduce the search space of Q-learning, resulting in major
          153 computational gains. Rigorous convergence analysis is provided, supported by
          154 numerical experiments. The numerical experiments show excellent empirical
          155 performance of the proposed scheme.
          156 </summary></entry><entry><id>http://arxiv.org/abs/2003.05738</id><title>IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control (update)</title><updated>2021-09-23T09:06:49.535152+00:00</updated><author><name>François-Xavier Devailly</name></author><author><name>Denis Larocque</name></author><author><name>Laurent Charlin</name></author><link href="http://arxiv.org/abs/2003.05738" rel="alternate"/><summary>Scaling adaptive traffic-signal control involves dealing with combinatorial
          157 state and action spaces. Multi-agent reinforcement learning attempts to address
          158 this challenge by distributing control to specialized agents. However,
          159 specialization hinders generalization and transferability, and the
          160 computational graphs underlying neural-networks architectures -- dominating in
          161 the multi-agent setting -- do not offer the flexibility to handle an arbitrary
          162 number of entities which changes both between road networks, and over time as
          163 vehicles traverse the network. We introduce Inductive Graph Reinforcement
          164 Learning (IG-RL) based on graph-convolutional networks which adapts to the
          165 structure of any road network, to learn detailed representations of
          166 traffic-controllers and their surroundings. Our decentralized approach enables
          167 learning of a transferable-adaptive-traffic-signal-control policy. After being
          168 trained on an arbitrary set of road networks, our model can generalize to new
          169 road networks, traffic distributions, and traffic regimes, with no additional
          170 training and a constant number of parameters, enabling greater scalability
          171 compared to prior methods. Furthermore, our approach can exploit the
          172 granularity of available data by capturing the (dynamic) demand at both the
          173 lane and the vehicle levels. The proposed method is tested on both road
          174 networks and traffic settings never experienced during training. We compare
          175 IG-RL to multi-agent reinforcement learning and domain-specific baselines. In
          176 both synthetic road networks and in a larger experiment involving the control
          177 of the 3,971 traffic signals of Manhattan, we show that different
          178 instantiations of IG-RL outperform baselines.
          179 </summary></entry><entry><id>http://arxiv.org/abs/1905.10029</id><title>Power up! Robust Graph Convolutional Network via Graph Powering (update)</title><updated>2021-09-23T09:06:49.534750+00:00</updated><author><name>Ming Jin</name></author><author><name>Heng Chang</name></author><author><name>Wenwu Zhu</name></author><author><name>Somayeh Sojoudi</name></author><link href="http://arxiv.org/abs/1905.10029" rel="alternate"/><summary>Graph convolutional networks (GCNs) are powerful tools for graph-structured
          180 data. However, they have been recently shown to be vulnerable to topological
          181 attacks. To enhance adversarial robustness, we go beyond spectral graph theory
          182 to robust graph theory. By challenging the classical graph Laplacian, we
          183 propose a new convolution operator that is provably robust in the spectral
          184 domain and is incorporated in the GCN architecture to improve expressivity and
          185 interpretability. By extending the original graph to a sequence of graphs, we
          186 also propose a robust training paradigm that encourages transferability across
          187 graphs that span a range of spatial and spectral characteristics. The proposed
          188 approaches are demonstrated in extensive experiments to simultaneously improve
          189 performance in both benign and adversarial situations.
          190 </summary></entry><entry><id>http://arxiv.org/abs/2109.10319</id><title>Consistency of spectral clustering for directed network community detection</title><updated>2021-09-23T09:06:49.534381+00:00</updated><author><name>Huan Qing</name></author><author><name>Jingli Wang</name></author><link href="http://arxiv.org/abs/2109.10319" rel="alternate"/><summary>Directed networks appear in various areas, such as biology, sociology,
          191 physiology and computer science. However, at present, most network analysis
          192 ignores the direction. In this paper, we construct a spectral clustering method
          193 based on the singular decomposition of the adjacency matrix to detect community
          194 in directed stochastic block model (DiSBM). By considering a sparsity
          195 parameter, under some mild conditions, we show the proposed approach can
          196 consistently recover hidden row and column communities for different scaling of
          197 degrees.
          198 </summary></entry><entry><id>http://arxiv.org/abs/2109.10298</id><title>Assured Neural Network Architectures for Control and Identification of Nonlinear Systems</title><updated>2021-09-23T09:06:49.534036+00:00</updated><author><name>James Ferlez</name></author><author><name>Yasser Shoukry</name></author><link href="http://arxiv.org/abs/2109.10298" rel="alternate"/><summary>In this paper, we consider the problem of automatically designing a Rectified
          199 Linear Unit (ReLU) Neural Network (NN) architecture (number of layers and
          200 number of neurons per layer) with the assurance that it is sufficiently
          201 parametrized to control a nonlinear system; i.e. control the system to satisfy
          202 a given formal specification. This is unlike current techniques, which provide
          203 no assurances on the resultant architecture. Moreover, our approach requires
          204 only limited knowledge of the underlying nonlinear system and specification. We
          205 assume only that the specification can be satisfied by a Lipschitz-continuous
          206 controller with a known bound on its Lipschitz constant; the specific
          207 controller need not be known. From this assumption, we bound the number of
          208 affine functions needed to construct a Continuous Piecewise Affine (CPWA)
          209 function that can approximate any Lipschitz-continuous controller that
          210 satisfies the specification. Then we connect this CPWA to a NN architecture
          211 using the authors' recent results on the Two-Level Lattice (TLL) NN
          212 architecture; the TLL architecture was shown to be parameterized by the number
          213 of affine functions present in the CPWA function it realizes.
          214 </summary></entry><entry><id>http://arxiv.org/abs/2109.10279</id><title>Multiblock-Networks: A Neural Network Analog to Component Based Methods for Multi-Source Data</title><updated>2021-09-23T09:06:49.533577+00:00</updated><author><name>Anna Jenul</name></author><author><name>Stefan Schrunner</name></author><author><name>Runar Helin</name></author><author><name>Kristian Hovde Liland</name></author><author><name>Cecilia Marie Futsæther</name></author><author><name>Oliver Tomic</name></author><link href="http://arxiv.org/abs/2109.10279" rel="alternate"/><summary>Training predictive models on datasets from multiple sources is a common, yet
          215 challenging setup in applied machine learning. Even though model interpretation
          216 has attracted more attention in recent years, many modeling approaches still
          217 focus mainly on performance. To further improve the interpretability of machine
          218 learning models, we suggest the adoption of concepts and tools from the
          219 well-established framework of component based multiblock analysis, also known
          220 as chemometrics. Nevertheless, artificial neural networks provide greater
          221 flexibility in model architecture and thus, often deliver superior predictive
          222 performance. In this study, we propose a setup to transfer the concepts of
          223 component based statistical models, including multiblock variants of principal
          224 component regression and partial least squares regression, to neural network
          225 architectures. Thereby, we combine the flexibility of neural networks with the
          226 concepts for interpreting block relevance in multiblock methods. In two use
          227 cases we demonstrate how the concept can be implemented in practice, and
          228 compare it to both common feed-forward neural networks without blocks, as well
          229 as statistical component based multiblock methods. Our results underline that
          230 multiblock networks allow for basic model interpretation while matching the
          231 performance of ordinary feed-forward neural networks.
          232 </summary></entry><entry><id>http://arxiv.org/abs/2109.10262</id><title>Generalized Optimization: A First Step Towards Category Theoretic Learning Theory</title><updated>2021-09-23T09:06:49.533249+00:00</updated><author><name>Dan Shiebler</name></author><link href="http://arxiv.org/abs/2109.10262" rel="alternate"/><summary>The Cartesian reverse derivative is a categorical generalization of
          233 reverse-mode automatic differentiation. We use this operator to generalize
          234 several optimization algorithms, including a straightforward generalization of
          235 gradient descent and a novel generalization of Newton's method. We then explore
          236 which properties of these algorithms are preserved in this generalized setting.
          237 First, we show that the transformation invariances of these algorithms are
          238 preserved: while generalized Newton's method is invariant to all invertible
          239 linear transformations, generalized gradient descent is invariant only to
          240 orthogonal linear transformations. Next, we show that we can express the change
          241 in loss of generalized gradient descent with an inner product-like expression,
          242 thereby generalizing the non-increasing and convergence properties of the
          243 gradient descent optimization flow. Finally, we include several numerical
          244 experiments to illustrate the ideas in the paper and demonstrate how we can use
          245 them to optimize polynomial functions over an ordered ring.
          246 </summary></entry><entry><id>http://arxiv.org/abs/2109.10254</id><title>Uncertainty Toolbox: an Open-Source Library for Assessing, Visualizing, and Improving Uncertainty Quantification</title><updated>2021-09-23T09:06:49.532048+00:00</updated><author><name>Youngseog Chung</name></author><author><name>Ian Char</name></author><author><name>Han Guo</name></author><author><name>Jeff Schneider</name></author><author><name>Willie Neiswanger</name></author><link href="http://arxiv.org/abs/2109.10254" rel="alternate"/><summary>With increasing deployment of machine learning systems in various real-world
          247 tasks, there is a greater need for accurate quantification of predictive
          248 uncertainty. While the common goal in uncertainty quantification (UQ) in
          249 machine learning is to approximate the true distribution of the target data,
          250 many works in UQ tend to be disjoint in the evaluation metrics utilized, and
          251 disparate implementations for each metric lead to numerical results that are
          252 not directly comparable across different works. To address this, we introduce
          253 Uncertainty Toolbox, an open-source python library that helps to assess,
          254 visualize, and improve UQ. Uncertainty Toolbox additionally provides
          255 pedagogical resources, such as a glossary of key terms and an organized
          256 collection of key paper references. We hope that this toolbox is useful for
          257 accelerating and uniting research efforts in uncertainty in machine learning.
          258 </summary></entry><entry><id>http://arxiv.org/abs/2109.10219</id><title>Adaptive Reliability Analysis for Multi-fidelity Models using a Collective Learning Strategy</title><updated>2021-09-23T09:06:49.531656+00:00</updated><author><name>Chi Zhang</name></author><author><name>Chaolin Song</name></author><author><name>Abdollah Shafieezadeh</name></author><link href="http://arxiv.org/abs/2109.10219" rel="alternate"/><summary>In many fields of science and engineering, models with different fidelities
          259 are available. Physical experiments or detailed simulations that accurately
          260 capture the behavior of the system are regarded as high-fidelity models with
          261 low model uncertainty, however, they are expensive to run. On the other hand,
          262 simplified physical experiments or numerical models are seen as low-fidelity
          263 models that are cheaper to evaluate. Although low-fidelity models are often not
          264 suitable for direct use in reliability analysis due to their low accuracy, they
          265 can offer information about the trend of the high-fidelity model thus providing
          266 the opportunity to explore the design space at a low cost. This study presents
          267 a new approach called adaptive multi-fidelity Gaussian process for reliability
          268 analysis (AMGPRA). Contrary to selecting training points and information
          269 sources in two separate stages as done in state-of-the-art mfEGRA method, the
          270 proposed approach finds the optimal training point and information source
          271 simultaneously using the novel collective learning function (CLF). CLF is able
          272 to assess the global impact of a candidate training point from an information
          273 source and it accommodates any learning function that satisfies a certain
          274 profile. In this context, CLF provides a new direction for quantifying the
          275 impact of new training points and can be easily extended with new learning
          276 functions to adapt to different reliability problems. The performance of the
          277 proposed method is demonstrated by three mathematical examples and one
          278 engineering problem concerning the wind reliability of transmission towers. It
          279 is shown that the proposed method achieves similar or higher accuracy with
          280 reduced computational costs compared to state-of-the-art single and
          281 multi-fidelity methods. A key application of AMGPRA is high-fidelity fragility
          282 modeling using complex and costly physics-based computational models.
          283 </summary></entry><entry><id>http://arxiv.org/abs/2109.10162</id><title>Learning low-degree functions from a logarithmic number of random queries</title><updated>2021-09-23T09:06:49.531322+00:00</updated><author><name>Alexandros Eskenazis</name></author><author><name>Paata Ivanisvili</name></author><link href="http://arxiv.org/abs/2109.10162" rel="alternate"/><summary>We prove that for any integer $n\in\mathbb{N}$, $d\in\{1,\ldots,n\}$ and any
          284 $\varepsilon,\delta\in(0,1)$, a bounded function $f:\{-1,1\}^n\to[-1,1]$ of
          285 degree at most $d$ can be learned with probability at least $1-\delta$ and
          286 $L_2$-error $\varepsilon$ using $\log(\tfrac{n}{\delta})\,\varepsilon^{-d-1}
          287 C^{d^{3/2}\sqrt{\log d}}$ random queries for a universal finite constant $C&gt;1$.
          288 </summary></entry><entry><id>http://arxiv.org/abs/2109.09988</id><title>Signal Classification using Smooth Coefficients of Multiple wavelets</title><updated>2021-09-23T09:06:49.530981+00:00</updated><author><name>Paul Grant</name></author><author><name>Md Zahidul Islam</name></author><link href="http://arxiv.org/abs/2109.09988" rel="alternate"/><summary>Classification of time series signals has become an important construct and
          289 has many practical applications. With existing classifiers we may be able to
          290 accurately classify signals, however that accuracy may decline if using a
          291 reduced number of attributes. Transforming the data then undertaking reduction
          292 in dimensionality may improve the quality of the data analysis, decrease time
          293 required for classification and simplify models. We propose an approach, which
          294 chooses suitable wavelets to transform the data, then combines the output from
          295 these transforms to construct a dataset to then apply ensemble classifiers to.
          296 We demonstrate this on different data sets, across different classifiers and
          297 use differing evaluation methods. Our experimental results demonstrate the
          298 effectiveness of the proposed technique, compared to the approaches that use
          299 either raw signal data or a single wavelet transform.
          300 </summary></entry><entry><id>http://arxiv.org/abs/2109.09859</id><title>Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective</title><updated>2021-09-23T09:06:49.530601+00:00</updated><author><name>Kabir Aladin Chandrasekher</name></author><author><name>Ashwin Pananjady</name></author><author><name>Christos Thrampoulidis</name></author><link href="http://arxiv.org/abs/2109.09859" rel="alternate"/><summary>We consider a general class of regression models with normally distributed
          301 covariates, and the associated nonconvex problem of fitting these models from
          302 data. We develop a general recipe for analyzing the convergence of iterative
          303 algorithms for this task from a random initialization. In particular, provided
          304 each iteration can be written as the solution to a convex optimization problem
          305 satisfying some natural conditions, we leverage Gaussian comparison theorems to
          306 derive a deterministic sequence that provides sharp upper and lower bounds on
          307 the error of the algorithm with sample-splitting. Crucially, this deterministic
          308 sequence accurately captures both the convergence rate of the algorithm and the
          309 eventual error floor in the finite-sample regime, and is distinct from the
          310 commonly used "population" sequence that results from taking the
          311 infinite-sample limit. We apply our general framework to derive several
          312 concrete consequences for parameter estimation in popular statistical models
          313 including phase retrieval and mixtures of regressions. Provided the sample size
          314 scales near-linearly in the dimension, we show sharp global convergence rates
          315 for both higher-order algorithms based on alternating updates and first-order
          316 algorithms based on subgradient descent. These corollaries, in turn, yield
          317 multiple consequences, including: (a) Proof that higher-order algorithms can
          318 converge significantly faster than their first-order counterparts (and
          319 sometimes super-linearly), even if the two share the same population update and
          320 (b) Intricacies in super-linear convergence behavior for higher-order
          321 algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensitive to
          322 the noise level in the problem. We complement these results with extensive
          323 numerical experiments, which show excellent agreement with our theoretical
          324 predictions.
          325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09856</id><title>SFFDD: Deep Neural Network with Enriched Features for Failure Prediction with Its Application to Computer Disk Driver</title><updated>2021-09-23T09:06:49.530264+00:00</updated><author><name>Lanfa Frank Wang</name></author><author><name>Danjue Li</name></author><link href="http://arxiv.org/abs/2109.09856" rel="alternate"/><summary>A classification technique incorporating a novel feature derivation method is
          326 proposed for predicting failure of a system or device with multivariate time
          327 series sensor data. We treat the multivariate time series sensor data as images
          328 for both visualization and computation. Failure follows various patterns which
          329 are closely related to the root causes. Different predefined transformations
          330 are applied on the original sensors data to better characterize the failure
          331 patterns. In addition to feature derivation, ensemble method is used to further
          332 improve the performance. In addition, a general algorithm architecture of deep
          333 neural network is proposed to handle multiple types of data with less manual
          334 feature engineering. We apply the proposed method on the early predict failure
          335 of computer disk drive in order to improve storage systems availability and
          336 avoid data loss. The classification accuracy is largely improved with the
          337 enriched features, named smart features.
          338 </summary></entry><entry><id>http://arxiv.org/abs/2109.09855</id><title>Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits</title><updated>2021-09-23T09:06:49.529889+00:00</updated><author><name>Guojun Xiong</name></author><author><name>Jian Li</name></author><author><name>Rahul Singh</name></author><link href="http://arxiv.org/abs/2109.09855" rel="alternate"/><summary>We study a finite-horizon restless multi-armed bandit problem with multiple
          339 actions, dubbed R(MA)^2B. The state of each arm evolves according to a
          340 controlled Markov decision process (MDP), and the reward of pulling an arm
          341 depends on both the current state of the corresponding MDP and the action
          342 taken. The goal is to sequentially choose actions for arms so as to maximize
          343 the expected value of the cumulative rewards collected. Since finding the
          344 optimal policy is typically intractable, we propose a computationally appealing
          345 index policy which we call Occupancy-Measured-Reward Index Policy. Our policy
          346 is well-defined even if the underlying MDPs are not indexable. We prove that it
          347 is asymptotically optimal when the activation budget and number of arms are
          348 scaled up, while keeping their ratio as a constant. For the case when the
          349 system parameters are unknown, we develop a learning algorithm. Our learning
          350 algorithm uses the principle of optimism in the face of uncertainty and further
          351 uses a generative model in order to fully exploit the structure of
          352 Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algorithm.
          353 As compared with the existing algorithms, R(MA)^2B-UCB performs close to an
          354 offline optimum policy, and also achieves a sub-linear regret with a low
          355 computational complexity. Experimental results show that R(MA)^2B-UCB
          356 outperforms the existing algorithms in both regret and run time.
          357 </summary></entry><entry><id>http://arxiv.org/abs/2109.09847</id><title>Fast TreeSHAP: Accelerating SHAP Value Computation for Trees</title><updated>2021-09-23T09:06:49.529569+00:00</updated><author><name>Jilei Yang</name></author><link href="http://arxiv.org/abs/2109.09847" rel="alternate"/><summary>SHAP (SHapley Additive exPlanation) values are one of the leading tools for
          358 interpreting machine learning models, with strong theoretical guarantees
          359 (consistency, local accuracy) and a wide availability of implementations and
          360 use cases. Even though computing SHAP values takes exponential time in general,
          361 TreeSHAP takes polynomial time on tree-based models. While the speedup is
          362 significant, TreeSHAP can still dominate the computation time of industry-level
          363 machine learning solutions on datasets with millions or more entries, causing
          364 delays in post-hoc model diagnosis and interpretation service. In this paper we
          365 present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve the
          366 computational efficiency of TreeSHAP for large datasets. We empirically find
          367 that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory
          368 cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at
          369 the cost of a slightly higher memory usage, thanks to the pre-computation of
          370 expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-suited for
          371 multi-time model interpretations, resulting in as high as 3x faster explanation
          372 of newly incoming samples.
          373 </summary></entry><entry><id>http://arxiv.org/abs/2109.09831</id><title>SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization</title><updated>2021-09-23T09:06:49.529020+00:00</updated><author><name>Marius Lindauer</name></author><author><name>Katharina Eggensperger</name></author><author><name>Matthias Feurer</name></author><author><name>André Biedenkapp</name></author><author><name>Difan Deng</name></author><author><name>Carolin Benjamins</name></author><author><name>René Sass</name></author><author><name>Frank Hutter</name></author><link href="http://arxiv.org/abs/2109.09831" rel="alternate"/><summary>Algorithm parameters, in particular hyperparameters of machine learning
          374 algorithms, can substantially impact their performance. To support users in
          375 determining well-performing hyperparameter configurations for their algorithms,
          376 datasets and applications at hand, SMAC3 offers a robust and flexible framework
          377 for Bayesian Optimization, which can improve performance within a few
          378 evaluations. It offers several facades and pre-sets for typical use cases, such
          379 as optimizing hyperparameters, solving low dimensional continuous (artificial)
          380 global optimization problems and configuring algorithms to perform well across
          381 multiple problem instances. The SMAC3 package is available under a permissive
          382 BSD-license at https://github.com/automl/SMAC3.
          383 </summary></entry><entry><id>http://arxiv.org/abs/2109.09816</id><title>Deviation-Based Learning</title><updated>2021-09-23T09:06:49.528686+00:00</updated><author><name>Junpei Komiyama</name></author><author><name>Shunya Noda</name></author><link href="http://arxiv.org/abs/2109.09816" rel="alternate"/><summary>We propose deviation-based learning, a new approach to training recommender
          384 systems. In the beginning, the recommender and rational users have different
          385 pieces of knowledge, and the recommender needs to learn the users' knowledge to
          386 make better recommendations. The recommender learns users' knowledge by
          387 observing whether each user followed or deviated from her recommendations. We
          388 show that learning frequently stalls if the recommender always recommends a
          389 choice: users tend to follow the recommendation blindly, and their choices do
          390 not reflect their knowledge. Social welfare and the learning rate are improved
          391 drastically if the recommender abstains from recommending a choice when she
          392 predicts that multiple arms will produce a similar payoff.
          393 </summary></entry><entry><id>http://arxiv.org/abs/2011.02602</id><title>Merchant Category Identification Using Credit Card Transactions</title><updated>2021-09-23T09:06:49.528234+00:00</updated><author><name>Chin-Chia Michael Yeh</name></author><author><name>Zhongfang Zhuang</name></author><author><name>Yan Zheng</name></author><author><name>Liang Wang</name></author><author><name>Junpeng Wang</name></author><author><name>Wei Zhang</name></author><link href="http://arxiv.org/abs/2011.02602" rel="alternate"/><summary>Digital payment volume has proliferated in recent years with the rapid growth
          394 of small businesses and online shops. When processing these digital
          395 transactions, recognizing each merchant's real identity (i.e., business type)
          396 is vital to ensure the integrity of payment processing systems. Conventionally,
          397 this problem is formulated as a time series classification problem solely using
          398 the merchant transaction history. However, with the large scale of the data,
          399 and changing behaviors of merchants and consumers over time, it is extremely
          400 challenging to achieve satisfying performance from off-the-shelf classification
          401 methods. In this work, we approach this problem from a multi-modal learning
          402 perspective, where we use not only the merchant time series data but also the
          403 information of merchant-merchant relationship (i.e., affinity) to verify the
          404 self-reported business type (i.e., merchant category) of a given merchant.
          405 Specifically, we design two individual encoders, where one is responsible for
          406 encoding temporal information and the other is responsible for affinity
          407 information, and a mechanism to fuse the outputs of the two encoders to
          408 accomplish the identification task. Our experiments on real-world credit card
          409 transaction data between 71,668 merchants and 433,772,755 customers have
          410 demonstrated the effectiveness and efficiency of the proposed model.
          411 </summary></entry><entry><id>http://arxiv.org/abs/2007.05303</id><title>Multi-future Merchant Transaction Prediction</title><updated>2021-09-23T09:06:49.527829+00:00</updated><author><name>Chin-Chia Michael Yeh</name></author><author><name>Zhongfang Zhuang</name></author><author><name>Wei Zhang</name></author><author><name>Liang Wang</name></author><link href="http://arxiv.org/abs/2007.05303" rel="alternate"/><summary>The multivariate time series generated from merchant transaction history can
          412 provide critical insights for payment processing companies. The capability of
          413 predicting merchants' future is crucial for fraud detection and recommendation
          414 systems. Conventionally, this problem is formulated to predict one multivariate
          415 time series under the multi-horizon setting. However, real-world applications
          416 often require more than one future trend prediction considering the
          417 uncertainties, where more than one multivariate time series needs to be
          418 predicted. This problem is called multi-future prediction. In this work, we
          419 combine the two research directions and propose to study this new problem:
          420 multi-future, multi-horizon and multivariate time series prediction. This
          421 problem is crucial as it has broad use cases in the financial industry to
          422 reduce the risk while improving user experience by providing alternative
          423 futures. This problem is also challenging as now we not only need to capture
          424 the patterns and insights from the past but also train a model that has a
          425 strong inference capability to project multiple possible outcomes. To solve
          426 this problem, we propose a new model using convolutional neural networks and a
          427 simple yet effective encoder-decoder structure to learn the time series pattern
          428 from multiple perspectives. We use experiments on real-world merchant
          429 transaction data to demonstrate the effectiveness of our proposed model. We
          430 also provide extensive discussions on different model design choices in our
          431 experimental section.
          432 </summary></entry><entry><id>http://arxiv.org/abs/2109.09690</id><title>Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes (update)</title><updated>2021-09-23T09:06:49.527407+00:00</updated><author><name>Jongseok Lee</name></author><author><name>Jianxiang Feng</name></author><author><name>Matthias Humt</name></author><author><name>Marcus G. Müller</name></author><author><name>Rudolph Triebel</name></author><link href="http://arxiv.org/abs/2109.09690" rel="alternate"/><summary>This paper presents a probabilistic framework to obtain both reliable and
          433 fast uncertainty estimates for predictions with Deep Neural Networks (DNNs).
          434 Our main contribution is a practical and principled combination of DNNs with
          435 sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen
          436 as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we
          437 devise a learning algorithm that brings the derived theory into practice. In
          438 experiments from two different robotic tasks -- inverse dynamics of a
          439 manipulator and object detection on a micro-aerial vehicle (MAV) -- we show the
          440 effectiveness of our approach in terms of predictive uncertainty, improved
          441 scalability, and run-time efficiency on a Jetson TX2. We thus argue that our
          442 approach can pave the way towards reliable and fast robot learning systems with
          443 uncertainty awareness.
          444 </summary></entry><entry><id>http://arxiv.org/abs/2109.09658</id><title>FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Future Medical Imaging (update)</title><updated>2021-09-23T09:06:49.526638+00:00</updated><author><name>Karim Lekadir</name></author><author><name>Richard Osuala</name></author><author><name>Catherine Gallin</name></author><author><name>Noussair Lazrak</name></author><author><name>Kaisar Kushibar</name></author><author><name>Gianna Tsakou</name></author><author><name>Susanna Aussó</name></author><author><name>Leonor Cerdá Alberich</name></author><author><name>Konstantinos Marias</name></author><author><name>Manolis Tskinakis</name></author><author><name>Sara Colantonio</name></author><author><name>Nickolas Papanikolaou</name></author><author><name>Zohaib Salahuddin</name></author><author><name>Henry C Woodruff</name></author><author><name>Philippe Lambin</name></author><author><name>Luis Martí-Bonmatí</name></author><link href="http://arxiv.org/abs/2109.09658" rel="alternate"/><summary>The recent advancements in artificial intelligence (AI) combined with the
          445 extensive amount of data generated by today's clinical systems, has led to the
          446 development of imaging AI solutions across the whole value chain of medical
          447 imaging, including image reconstruction, medical image segmentation,
          448 image-based diagnosis and treatment planning. Notwithstanding the successes and
          449 future potential of AI in medical imaging, many stakeholders are concerned of
          450 the potential risks and ethical implications of imaging AI solutions, which are
          451 perceived as complex, opaque, and difficult to comprehend, utilise, and trust
          452 in critical clinical applications. Despite these concerns and risks, there are
          453 currently no concrete guidelines and best practices for guiding future AI
          454 developments in medical imaging towards increased trust, safety and adoption.
          455 To bridge this gap, this paper introduces a careful selection of guiding
          456 principles drawn from the accumulated experiences, consensus, and best
          457 practices from five large European projects on AI in Health Imaging. These
          458 guiding principles are named FUTURE-AI and its building blocks consist of (i)
          459 Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Robustness
          460 and (vi) Explainability. In a step-by-step approach, these guidelines are
          461 further translated into a framework of concrete recommendations for specifying,
          462 developing, evaluating, and deploying technically, clinically and ethically
          463 trustworthy AI solutions into clinical practice.
          464 </summary></entry><entry><id>http://arxiv.org/abs/2109.09105</id><title>What BERT Based Language Models Learn in Spoken Transcripts: An Empirical Study (update)</title><updated>2021-09-23T09:06:49.526265+00:00</updated><author><name>Ayush Kumar</name></author><author><name>Mukuntha Narayanan Sundararaman</name></author><author><name>Jithendra Vepa</name></author><link href="http://arxiv.org/abs/2109.09105" rel="alternate"/><summary>Language Models (LMs) have been ubiquitously leveraged in various tasks
          465 including spoken language understanding (SLU). Spoken language requires careful
          466 understanding of speaker interactions, dialog states and speech induced
          467 multimodal behaviors to generate a meaningful representation of the
          468 conversation. In this work, we propose to dissect SLU into three representative
          469 properties:conversational (disfluency, pause, overtalk), channel (speaker-type,
          470 turn-tasks) and ASR (insertion, deletion,substitution). We probe BERT based
          471 language models (BERT, RoBERTa) trained on spoken transcripts to investigate
          472 its ability to understand multifarious properties in absence of any speech
          473 cues. Empirical results indicate that LM is surprisingly good at capturing
          474 conversational properties such as pause prediction and overtalk detection from
          475 lexical tokens. On the downsides, the LM scores low on turn-tasks and ASR
          476 errors predictions. Additionally, pre-training the LM on spoken transcripts
          477 restrain its linguistic understanding. Finally, we establish the efficacy and
          478 transferability of the mentioned properties on two benchmark datasets:
          479 Switchboard Dialog Act and Disfluency datasets.
          480 </summary></entry><entry><id>http://arxiv.org/abs/2109.07436</id><title>Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes (update)</title><updated>2021-09-23T09:06:49.525891+00:00</updated><author><name>Sriram Gopalakrishnan</name></author><author><name>Mudit Verma</name></author><author><name>Subbarao Kambhampati</name></author><link href="http://arxiv.org/abs/2109.07436" rel="alternate"/><summary>When humans are given a policy to execute, there can be policy execution
          481 errors and deviations in execution if there is uncertainty in identifying a
          482 state. So an algorithm that computes a policy for a human to execute ought to
          483 consider these effects in its computations. An optimal MDP policy that is
          484 poorly executed (because of a human agent) maybe much worse than another policy
          485 that is executed with fewer errors. In this paper, we consider the problems of
          486 erroneous execution and execution delay when computing policies for a human
          487 agent that would act in a setting modeled by a Markov Decision Process. We
          488 present a framework to model the likelihood of policy execution errors and
          489 likelihood of non-policy actions like inaction (delays) due to state
          490 uncertainty. This is followed by a hill climbing algorithm to search for good
          491 policies that account for these errors. We then use the best policy found by
          492 hill climbing with a branch and bound algorithm to find the optimal policy. We
          493 show experimental results in a Gridworld domain and analyze the performance of
          494 the two algorithms. We also present human studies that verify if our
          495 assumptions on policy execution by humans under state-aliasing are reasonable.
          496 </summary></entry><entry><id>http://arxiv.org/abs/2109.01134</id><title>Learning to Prompt for Vision-Language Models (update)</title><updated>2021-09-23T09:06:49.525484+00:00</updated><author><name>Kaiyang Zhou</name></author><author><name>Jingkang Yang</name></author><author><name>Chen Change Loy</name></author><author><name>Ziwei Liu</name></author><link href="http://arxiv.org/abs/2109.01134" rel="alternate"/><summary>Vision-language pre-training has recently emerged as a promising alternative
          497 for representation learning. It shifts from the tradition of using images and
          498 discrete labels for learning a fixed set of weights, seen as visual concepts,
          499 to aligning images and raw text for two separate encoders. Such a paradigm
          500 benefits from a broader source of supervision and allows zero-shot transfer to
          501 downstream tasks since visual concepts can be diametrically generated from
          502 natural language, known as prompt. In this paper, we identify that a major
          503 challenge of deploying such models in practice is prompt engineering. This is
          504 because designing a proper prompt, especially for context words surrounding a
          505 class name, requires domain expertise and typically takes a significant amount
          506 of time for words tuning since a slight change in wording could have a huge
          507 impact on performance. Moreover, different downstream tasks require specific
          508 designs, further hampering the efficiency of deployment. To overcome this
          509 challenge, we propose a novel approach named context optimization (CoOp). The
          510 main idea is to model context in prompts using continuous representations and
          511 perform end-to-end learning from data while keeping the pre-trained parameters
          512 fixed. In this way, the design of task-relevant prompts can be fully automated.
          513 Experiments on 11 datasets show that CoOp effectively turns pre-trained
          514 vision-language models into data-efficient visual learners, requiring as few as
          515 one or two shots to beat hand-crafted prompts with a decent margin and able to
          516 gain significant improvements when using more shots (e.g., at 16 shots the
          517 average gain is around 17% with the highest reaching over 50%). CoOp also
          518 exhibits strong robustness to distribution shift.
          519 </summary></entry><entry><id>http://arxiv.org/abs/2108.09432</id><title>ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators (update)</title><updated>2021-09-23T09:06:49.525027+00:00</updated><author><name>Qixing Huang</name></author><author><name>Xiangru Huang</name></author><author><name>Bo Sun</name></author><author><name>Zaiwei Zhang</name></author><author><name>Junfeng Jiang</name></author><author><name>Chandrajit Bajaj</name></author><link href="http://arxiv.org/abs/2108.09432" rel="alternate"/><summary>This paper introduces an unsupervised loss for training parametric
          520 deformation shape generators. The key idea is to enforce the preservation of
          521 local rigidity among the generated shapes. Our approach builds on an
          522 approximation of the as-rigid-as possible (or ARAP) deformation energy. We show
          523 how to develop the unsupervised loss via a spectral decomposition of the
          524 Hessian of the ARAP energy. Our loss nicely decouples pose and shape variations
          525 through a robust norm. The loss admits simple closed-form expressions. It is
          526 easy to train and can be plugged into any standard generation models, e.g.,
          527 variational auto-encoder (VAE) and auto-decoder (AD). Experimental results show
          528 that our approach outperforms existing shape generation approaches considerably
          529 on public benchmark datasets of various shape categories such as human, animal
          530 and bone.
          531 </summary></entry><entry><id>http://arxiv.org/abs/2107.11913</id><title>Measuring Ethics in AI with AI: A Methodology and Dataset Construction (update)</title><updated>2021-09-23T09:06:49.524619+00:00</updated><author><name>Pedro H.C. Avelar</name></author><author><name>Rafael B. Audibert</name></author><author><name>Anderson R. Tavares</name></author><author><name>Luís C. Lamb</name></author><link href="http://arxiv.org/abs/2107.11913" rel="alternate"/><summary>Recently, the use of sound measures and metrics in Artificial Intelligence
          532 has become the subject of interest of academia, government, and industry.
          533 Efforts towards measuring different phenomena have gained traction in the AI
          534 community, as illustrated by the publication of several influential field
          535 reports and policy documents. These metrics are designed to help decision
          536 takers to inform themselves about the fast-moving and impacting influences of
          537 key advances in Artificial Intelligence in general and Machine Learning in
          538 particular. In this paper we propose to use such newfound capabilities of AI
          539 technologies to augment our AI measuring capabilities. We do so by training a
          540 model to classify publications related to ethical issues and concerns. In our
          541 methodology we use an expert, manually curated dataset as the training set and
          542 then evaluate a large set of research papers. Finally, we highlight the
          543 implications of AI metrics, in particular their contribution towards developing
          544 trustful and fair AI-based tools and technologies. Keywords: AI Ethics; AI
          545 Fairness; AI Measurement. Ethics in Computer Science.
          546 </summary></entry><entry><id>http://arxiv.org/abs/2107.04775</id><title>LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks (update)</title><updated>2021-09-23T09:06:49.524190+00:00</updated><author><name>Albert Wilcox</name></author><author><name>Ashwin Balakrishna</name></author><author><name>Brijen Thananjeyan</name></author><author><name>Joseph E. Gonzalez</name></author><author><name>Ken Goldberg</name></author><link href="http://arxiv.org/abs/2107.04775" rel="alternate"/><summary>Reinforcement learning (RL) has shown impressive success in exploring
          547 high-dimensional environments to learn complex tasks, but can often exhibit
          548 unsafe behaviors and require extensive environment interaction when exploration
          549 is unconstrained. A promising strategy for learning in dynamically uncertain
          550 environments is requiring that the agent can robustly return to learned safe
          551 sets, where task success (and therefore safety) can be guaranteed. While this
          552 approach has been successful in low-dimensions, enforcing this constraint in
          553 environments with visual observations is exceedingly challenging. We present a
          554 novel continuous representation for safe sets by framing it as a binary
          555 classification problem in a learned latent space, which flexibly scales to
          556 image observations. We then present a new algorithm, Latent Space Safe Sets
          557 (LS3), which uses this representation for long-horizon tasks with sparse
          558 rewards. We evaluate LS3 on 4 domains, including a challenging sequential
          559 pushing task in simulation and a physical cable routing task. We find that LS3
          560 can use prior task successes to restrict exploration and learn more efficiently
          561 than prior algorithms while satisfying constraints. See
          562 https://tinyurl.com/latent-ss for code and supplementary material.
          563 </summary></entry><entry><id>http://arxiv.org/abs/2106.07857</id><title>Bilateral Personalized Dialogue Generation with Contrastive Learning (update)</title><updated>2021-09-23T09:06:49.523794+00:00</updated><author><name>Bin Li</name></author><author><name>Hanjun Deng</name></author><link href="http://arxiv.org/abs/2106.07857" rel="alternate"/><summary>Generating personalized responses is one of the major challenges in natural
          564 human-robot interaction. Current researches in this field mainly focus on
          565 generating responses consistent with the robot's pre-assigned persona, while
          566 ignoring the user's persona. Such responses may be inappropriate or even
          567 offensive, which may lead to the bad user experience. Therefore, we propose a
          568 Bilateral Personalized Dialogue Generation (BPDG) method for dyadic
          569 conversation, which integrates user and robot personas into dialogue generation
          570 via designing a dynamic persona-aware fusion method. To bridge the gap between
          571 the learning objective function and evaluation metrics, the Conditional Mutual
          572 Information Maximum (CMIM) criterion is adopted with contrastive learning to
          573 select the proper response from the generated candidates. Moreover, a bilateral
          574 persona accuracy metric is designed to measure the degree of bilateral
          575 personalization. Experimental results demonstrate that, compared with several
          576 state-of-the-art methods, the final results of the proposed method are more
          577 personalized and consistent with bilateral personas in terms of both automatic
          578 and manual evaluations.
          579 </summary></entry><entry><id>http://arxiv.org/abs/2105.15033</id><title>DiaKG: an Annotated Diabetes Dataset for Medical Knowledge Graph Construction (update)</title><updated>2021-09-23T09:06:49.523206+00:00</updated><author><name>Dejie Chang</name></author><author><name>Mosha Chen</name></author><author><name>Chaozhen Liu</name></author><author><name>Liping Liu</name></author><author><name>Dongdong Li</name></author><author><name>Wei Li</name></author><author><name>Fei Kong</name></author><author><name>Bangchang Liu</name></author><author><name>Xiaobin Luo</name></author><author><name>Ji Qi</name></author><author><name>Qiao Jin</name></author><author><name>Bin Xu</name></author><link href="http://arxiv.org/abs/2105.15033" rel="alternate"/><summary>Knowledge Graph has been proven effective in modeling structured information
          580 and conceptual knowledge, especially in the medical domain. However, the lack
          581 of high-quality annotated corpora remains a crucial problem for advancing the
          582 research and applications on this task. In order to accelerate the research for
          583 domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a
          584 high-quality Chinese dataset for Diabetes knowledge graph, which contains
          585 22,050 entities and 6,890 relations in total. We implement recent typical
          586 methods for Named Entity Recognition and Relation Extraction as a benchmark to
          587 evaluate the proposed dataset thoroughly. Empirical results show that the DiaKG
          588 is challenging for most existing methods and further analysis is conducted to
          589 discuss future research direction for improvements. We hope the release of this
          590 dataset can assist the construction of diabetes knowledge graphs and facilitate
          591 AI-based applications.
          592 </summary></entry><entry><id>http://arxiv.org/abs/2105.11844</id><title>CI-dataset and DetDSCI methodology for detecting too small and too large critical infrastructures in satellite images: Airports and electrical substations as case study (update)</title><updated>2021-09-23T09:06:49.522772+00:00</updated><author><name>Francisco Pérez-Hernández</name></author><author><name>José Rodríguez-Ortega</name></author><author><name>Yassir Benhammou</name></author><author><name>Francisco Herrera</name></author><author><name>Siham Tabik</name></author><link href="http://arxiv.org/abs/2105.11844" rel="alternate"/><summary>The detection of critical infrastructures in large territories represented by
          593 aerial and satellite images is of high importance in several fields such as in
          594 security, anomaly detection, land use planning and land use change detection.
          595 However, the detection of such infrastructures is complex as they have highly
          596 variable shapes and sizes, i.e., some infrastructures, such as electrical
          597 substations, are too small while others, such as airports, are too large.
          598 Besides, airports can have a surface area either small or too large with
          599 completely different shapes, which makes its correct detection challenging. As
          600 far as we know, these limitations have not been tackled yet in previous works.
          601 This paper presents (1) a smart Critical Infrastructure dataset, named
          602 CI-dataset, organised into two scales, small and large scales critical
          603 infrastructures and (2) a two-level resolution-independent critical
          604 infrastructure detection (DetDSCI) methodology that first determines the
          605 spatial resolution of the input image using a classification model, then
          606 analyses the image using the appropriate detector for that spatial resolution.
          607 The present study targets two representative classes, airports and electrical
          608 substations. Our experiments show that DetDSCI methodology achieves up to
          609 37,53% F1 improvement with respect to Faster R-CNN, one of the most influential
          610 detection models.
          611 </summary></entry><entry><id>http://arxiv.org/abs/2103.13460</id><title>Under Pressure: Learning to Detect Slip with Barometric Tactile Sensors (update)</title><updated>2021-09-23T09:06:49.522356+00:00</updated><author><name>Abhinav Grover</name></author><author><name>Christopher Grebe</name></author><author><name>Philippe Nadeau</name></author><author><name>Jonathan Kelly</name></author><link href="http://arxiv.org/abs/2103.13460" rel="alternate"/><summary>Despite the utility of tactile information, tactile sensors have yet to be
          612 widely deployed in industrial robotics settings. Part of the challenge lies in
          613 identifying slip and other key events from the tactile data stream. In this
          614 paper, we present a learning-based method to detect slip using barometric
          615 tactile sensors. Although these sensors have a low resolution, they have many
          616 other desirable properties including high reliability and durability, a very
          617 slim profile, and a low cost. We are able to achieve slip detection accuracies
          618 of greater than 91% while being robust to the speed and direction of the slip
          619 motion. Further, we test our detector on two robot manipulation tasks involving
          620 common household objects and demonstrate successful generalization to
          621 real-world scenarios not seen during training. We show that barometric tactile
          622 sensing technology, combined with data-driven learning, is potentially suitable
          623 for complex manipulation tasks such as slip compensation.
          624 </summary></entry><entry><id>http://arxiv.org/abs/2102.08633</id><title>Open-Retrieval Conversational Machine Reading (update)</title><updated>2021-09-23T09:06:49.521944+00:00</updated><author><name>Yifan Gao</name></author><author><name>Jingjing Li</name></author><author><name>Michael R. Lyu</name></author><author><name>Irwin King</name></author><link href="http://arxiv.org/abs/2102.08633" rel="alternate"/><summary>In conversational machine reading, systems need to interpret natural language
          625 rules, answer high-level questions such as "May I qualify for VA health care
          626 benefits?", and ask follow-up clarification questions whose answer is necessary
          627 to answer the original question. However, existing works assume the rule text
          628 is provided for each user question, which neglects the essential retrieval step
          629 in real scenarios. In this work, we propose and investigate an open-retrieval
          630 setting of conversational machine reading. In the open-retrieval setting, the
          631 relevant rule texts are unknown so that a system needs to retrieve
          632 question-relevant evidence from a collection of rule texts, and answer users'
          633 high-level questions according to multiple retrieved rule texts in a
          634 conversational manner. We propose MUDERN, a Multi-passage Discourse-aware
          635 Entailment Reasoning Network which extracts conditions in the rule texts
          636 through discourse segmentation, conducts multi-passage entailment reasoning to
          637 answer user questions directly, or asks clarification follow-up questions to
          638 inquiry more information. On our created OR-ShARC dataset, MUDERN achieves the
          639 state-of-the-art performance, outperforming existing single-passage
          640 conversational machine reading models as well as a new multi-passage
          641 conversational machine reading baseline by a large margin. In addition, we
          642 conduct in-depth analyses to provide new insights into this new setting and our
          643 model.
          644 </summary></entry><entry><id>http://arxiv.org/abs/2102.07358</id><title>Weak Adaptation Learning -- Addressing Cross-domain Data Insufficiency with Weak Annotator (update)</title><updated>2021-09-23T09:06:49.521525+00:00</updated><author><name>Shichao Xu</name></author><author><name>Lixu Wang</name></author><author><name>Yixuan Wang</name></author><author><name>Qi Zhu</name></author><link href="http://arxiv.org/abs/2102.07358" rel="alternate"/><summary>Data quantity and quality are crucial factors for data-driven learning
          645 methods. In some target problem domains, there are not many data samples
          646 available, which could significantly hinder the learning process. While data
          647 from similar domains may be leveraged to help through domain adaptation,
          648 obtaining high-quality labeled data for those source domains themselves could
          649 be difficult or costly. To address such challenges on data insufficiency for
          650 classification problem in a target domain, we propose a weak adaptation
          651 learning (WAL) approach that leverages unlabeled data from a similar source
          652 domain, a low-cost weak annotator that produces labels based on task-specific
          653 heuristics, labeling rules, or other methods (albeit with inaccuracy), and a
          654 small amount of labeled data in the target domain. Our approach first conducts
          655 a theoretical analysis on the error bound of the trained classifier with
          656 respect to the data quantity and the performance of the weak annotator, and
          657 then introduces a multi-stage weak adaptation learning method to learn an
          658 accurate classifier by lowering the error bound. Our experiments demonstrate
          659 the effectiveness of our approach in learning an accurate classifier with
          660 limited labeled data in the target domain and unlabeled data in the source
          661 domain.
          662 </summary></entry><entry><id>http://arxiv.org/abs/2102.04394</id><title>Learning with Density Matrices and Random Features (update)</title><updated>2021-09-23T09:06:49.521043+00:00</updated><author><name>Fabio A. González</name></author><author><name>Alejandro Gallego</name></author><author><name>Santiago Toledo-Cortés</name></author><author><name>Vladimir Vargas-Calderón</name></author><link href="http://arxiv.org/abs/2102.04394" rel="alternate"/><summary>A density matrix describes the statistical state of a quantum system. It is a
          663 powerful formalism to represent both the quantum and classical uncertainty of
          664 quantum systems and to express different statistical operations such as
          665 measurement, system combination and expectations as linear algebra operations.
          666 This paper explores how density matrices can be used as a building block to
          667 build machine learning models exploiting their ability to straightforwardly
          668 combine linear algebra and probability. One of the main results of the paper is
          669 to show that density matrices coupled with random Fourier features could
          670 approximate arbitrary probability distributions over $\mathbb{R}^n$. Based on
          671 this finding the paper builds different models for density estimation,
          672 classification and regression. These models are differentiable, so it is
          673 possible to integrate them with other differentiable components, such as deep
          674 learning architectures and to learn their parameters using gradient-based
          675 optimization. In addition, the paper presents optimization-less training
          676 strategies based on estimation and model averaging. The models are evaluated in
          677 benchmark tasks and the results are reported and discussed.
          678 </summary></entry><entry><id>http://arxiv.org/abs/2011.11152</id><title>Understanding and Scheduling Weight Decay (update)</title><updated>2021-09-23T09:06:49.520655+00:00</updated><author><name>Zeke Xie</name></author><author><name>Issei Sato</name></author><author><name>Masashi Sugiyama</name></author><link href="http://arxiv.org/abs/2011.11152" rel="alternate"/><summary>Weight decay is a popular and even necessary regularization technique for
          679 training deep neural networks that generalize well. Previous work usually
          680 interpreted weight decay as a Gaussian prior from the Bayesian perspective.
          681 However, weight decay sometimes shows mysterious behaviors beyond the
          682 conventional understanding. For example, the optimal weight decay value tends
          683 to be zero given long enough training time. Moreover, existing work typically
          684 failed to recognize the importance of scheduling weight decay during training.
          685 Our work aims at theoretically understanding novel behaviors of weight decay
          686 and designing schedulers for weight decay in deep learning. This paper mainly
          687 has three contributions. First, we propose a novel theoretical interpretation
          688 of weight decay from the perspective of learning dynamics. Second, we propose a
          689 novel weight-decay linear scaling rule for large-batch training that
          690 proportionally increases weight decay rather than the learning rate as the
          691 batch size increases. Third, we provide an effective learning-rate-aware
          692 scheduler for weight decay, called the Stable Weight Decay (SWD) method, which,
          693 to the best of our knowledge, is the first practical design for weight decay
          694 scheduling. In our various experiments, the SWD method often makes improvements
          695 over $L_{2}$ Regularization and Decoupled Weight Decay.
          696 </summary></entry><entry><id>http://arxiv.org/abs/2011.02073</id><title>MBB: Model-Based Baseline for Efficient Reinforcement Learning (update)</title><updated>2021-09-23T09:06:49.520212+00:00</updated><author><name>Xubo Lyu</name></author><author><name>Site Li</name></author><author><name>Seth Siriya</name></author><author><name>Ye Pu</name></author><author><name>Mo Chen</name></author><link href="http://arxiv.org/abs/2011.02073" rel="alternate"/><summary>Model-free reinforcement learning (RL) is capable of learning control
          697 policies for high-dimensional, complex robotic tasks, but tends to be
          698 data-inefficient. Model-based RL tends to be more data-efficient but often
          699 suffers from learning a high-dimensional model that is good enough for policy
          700 improvement. This limits its use to learning simple models for restrictive
          701 domains. Optimal control generates solutions without collecting any data,
          702 assuming an accurate model of the system and environment is known, which is
          703 often true in many control theory applications. However, optimal control cannot
          704 be scaled to problems with a high-dimensional state space. In this paper, we
          705 propose a novel approach to alleviate data inefficiency of model-free RL in
          706 high-dimensional problems by warm-starting the learning process using a
          707 lower-dimensional model-based solution. Particularly, we initialize a baseline
          708 function for the high-dimensional RL problem via supervision from a
          709 lower-dimensional value function, which can be obtained by solving a
          710 lower-dimensional problem with a known, approximate model using "classical"
          711 techniques such as value iteration or optimal control. Therefore, our approach
          712 implicitly exploits the model priors from simplified problem space to
          713 facilitate the policy learning in high-dimensional RL tasks. We demonstrate our
          714 approach on two representative robotic learning tasks and observe significant
          715 improvement in policy performance and learning efficiency. We also evaluate our
          716 method empirically with a third task.
          717 </summary></entry><entry><id>http://arxiv.org/abs/2004.12908</id><title>Omnidirectional Transfer for Quasilinear Lifelong Learning (update)</title><updated>2021-09-23T09:06:49.519512+00:00</updated><author><name>Joshua T. Vogelstein</name></author><author><name>Jayanta Dey</name></author><author><name>Hayden S. Helm</name></author><author><name>Will LeVine</name></author><author><name>Ronak D. Mehta</name></author><author><name>Ali Geisa</name></author><author><name>Haoyin Xu</name></author><author><name>Gido M. van de Ven</name></author><author><name>Emily Chang</name></author><author><name>Chenyu Gao</name></author><author><name>Weiwei Yang</name></author><author><name>Bryan Tower</name></author><author><name>Jonathan Larson</name></author><author><name>Christopher M. White</name></author><author><name>Carey E. Priebe</name></author><link href="http://arxiv.org/abs/2004.12908" rel="alternate"/><summary>In biological learning, data are used to improve performance not only on the
          718 current task, but also on previously encountered and as yet unencountered
          719 tasks. In contrast, classical machine learning starts from a blank slate, or
          720 tabula rasa, using data only for the single task at hand. While typical
          721 transfer learning algorithms can improve performance on future tasks, their
          722 performance on prior tasks degrades upon learning new tasks (called
          723 catastrophic forgetting). Many recent approaches for continual or lifelong
          724 learning have attempted to maintain performance given new tasks. But striving
          725 to avoid forgetting sets the goal unnecessarily low: the goal of lifelong
          726 learning, whether biological or artificial, should be to improve performance on
          727 all tasks (including past and future) with any new data. We propose
          728 omnidirectional transfer learning algorithms, which includes two special cases
          729 of interest: decision forests and deep networks. Our key insight is the
          730 development of the omni-voter layer, which ensembles representations learned
          731 independently on all tasks to jointly decide how to proceed on any given new
          732 data point, thereby improving performance on both past and future tasks. Our
          733 algorithms demonstrate omnidirectional transfer in a variety of simulated and
          734 real data scenarios, including tabular data, image data, spoken data, and
          735 adversarial tasks. Moreover, they do so with quasilinear space and time
          736 complexity.
          737 </summary></entry><entry><id>http://arxiv.org/abs/2109.10322</id><title>CondNet: Conditional Classifier for Scene Segmentation</title><updated>2021-09-23T09:06:49.519051+00:00</updated><author><name>Changqian Yu</name></author><author><name>Yuanjie Shao</name></author><author><name>Changxin Gao</name></author><author><name>Nong Sang</name></author><link href="http://arxiv.org/abs/2109.10322" rel="alternate"/><summary>The fully convolutional network (FCN) has achieved tremendous success in
          738 dense visual recognition tasks, such as scene segmentation. The last layer of
          739 FCN is typically a global classifier (1x1 convolution) to recognize each pixel
          740 to a semantic label. We empirically show that this global classifier, ignoring
          741 the intra-class distinction, may lead to sub-optimal results.
          742 </summary></entry><entry><id>http://arxiv.org/abs/2109.10317</id><title>Introduction to Neural Network Verification</title><updated>2021-09-23T09:06:49.518738+00:00</updated><author><name>Aws Albarghouthi</name></author><link href="http://arxiv.org/abs/2109.10317" rel="alternate"/><summary>Deep learning has transformed the way we think of software and what it can
          743 do. But deep neural networks are fragile and their behaviors are often
          744 surprising. In many settings, we need to provide formal guarantees on the
          745 safety, security, correctness, or robustness of neural networks. This book
          746 covers foundational ideas from formal verification and their adaptation to
          747 reasoning about neural networks and deep learning.
          748 </summary></entry><entry><id>http://arxiv.org/abs/2109.10312</id><title>Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks</title><updated>2021-09-23T09:06:49.518267+00:00</updated><author><name>Bohan Wu</name></author><author><name>Suraj Nair</name></author><author><name>Li Fei-Fei</name></author><author><name>Chelsea Finn</name></author><link href="http://arxiv.org/abs/2109.10312" rel="alternate"/><summary>In this paper, we study the problem of learning a repertoire of low-level
          749 skills from raw images that can be sequenced to complete long-horizon
          750 visuomotor tasks. Reinforcement learning (RL) is a promising approach for
          751 acquiring short-horizon skills autonomously. However, the focus of RL
          752 algorithms has largely been on the success of those individual skills, more so
          753 than learning and grounding a large repertoire of skills that can be sequenced
          754 to complete extended multi-stage tasks. The latter demands robustness and
          755 persistence, as errors in skills can compound over time, and may require the
          756 robot to have a number of primitive skills in its repertoire, rather than just
          757 one. To this end, we introduce EMBR, a model-based RL method for learning
          758 primitive skills that are suitable for completing long-horizon visuomotor
          759 tasks. EMBR learns and plans using a learned model, critic, and success
          760 classifier, where the success classifier serves both as a reward function for
          761 RL and as a grounding mechanism to continuously detect if the robot should
          762 retry a skill when unsuccessful or under perturbations. Further, the learned
          763 model is task-agnostic and trained using data from all skills, enabling the
          764 robot to efficiently learn a number of distinct primitives. These visuomotor
          765 primitive skills and their associated pre- and post-conditions can then be
          766 directly combined with off-the-shelf symbolic planners to complete long-horizon
          767 tasks. On a Franka Emika robot arm, we find that EMBR enables the robot to
          768 complete three long-horizon visuomotor tasks at 85% success rate, such as
          769 organizing an office desk, a file cabinet, and drawers, which require
          770 sequencing up to 12 skills, involve 14 unique learned primitives, and demand
          771 generalization to novel objects.
          772 </summary></entry><entry><id>http://arxiv.org/abs/2109.10303</id><title>Computing Complexity-aware Plans Using Kolmogorov Complexity</title><updated>2021-09-23T09:06:49.517919+00:00</updated><author><name>Elis Stefansson</name></author><author><name>Karl H. Johansson</name></author><link href="http://arxiv.org/abs/2109.10303" rel="alternate"/><summary>In this paper, we introduce complexity-aware planning for finite-horizon
          773 deterministic finite automata with rewards as outputs, based on Kolmogorov
          774 complexity. Kolmogorov complexity is considered since it can detect
          775 computational regularities of deterministic optimal policies. We present a
          776 planning objective yielding an explicit trade-off between a policy's
          777 performance and complexity. It is proven that maximising this objective is
          778 non-trivial in the sense that dynamic programming is infeasible. We present two
          779 algorithms obtaining low-complexity policies, where the first algorithm obtains
          780 a low-complexity optimal policy, and the second algorithm finds a policy
          781 maximising performance while maintaining local (stage-wise) complexity
          782 constraints. We evaluate the algorithms on a simple navigation task for a
          783 mobile robot, where our algorithms yield low-complexity policies that concur
          784 with intuition.
          785 </summary></entry><entry><id>http://arxiv.org/abs/2109.10285</id><title>Early and Revocable Time Series Classification</title><updated>2021-09-23T09:06:49.517510+00:00</updated><author><name>Youssef Achenchabe</name></author><author><name>Alexis Bondu</name></author><author><name>Antoine Cornuéjols</name></author><author><name>Vincent Lemaire</name></author><link href="http://arxiv.org/abs/2109.10285" rel="alternate"/><summary>Many approaches have been proposed for early classification of time series in
          786 light of itssignificance in a wide range of applications including healthcare,
          787 transportation and fi-nance. Until now, the early classification problem has
          788 been dealt with by considering onlyirrevocable decisions. This paper introduces
          789 a new problem calledearly and revocabletimeseries classification, where the
          790 decision maker can revoke its earlier decisions based on thenew available
          791 measurements. In order to formalize and tackle this problem, we propose anew
          792 cost-based framework and derive two new approaches from it. The first approach
          793 doesnot consider explicitly the cost of changing decision, while the second one
          794 does. Exten-sive experiments are conducted to evaluate these approaches on a
          795 large benchmark of realdatasets. The empirical results obtained convincingly
          796 show (i) that the ability of revok-ing decisions significantly improves
          797 performance over the irrevocable regime, and (ii) thattaking into account the
          798 cost of changing decision brings even better results in
          799 general.Keywords:revocable decisions, cost estimation, online decision making
          800 </summary></entry><entry><id>http://arxiv.org/abs/2109.10246</id><title>Does Vision-and-Language Pretraining Improve Lexical Grounding?</title><updated>2021-09-23T09:06:49.517131+00:00</updated><author><name>Tian Yun</name></author><author><name>Chen Sun</name></author><author><name>Ellie Pavlick</name></author><link href="http://arxiv.org/abs/2109.10246" rel="alternate"/><summary>Linguistic representations derived from text alone have been criticized for
          801 their lack of grounding, i.e., connecting words to their meanings in the
          802 physical world. Vision-and-Language (VL) models, trained jointly on text and
          803 image or video data, have been offered as a response to such criticisms.
          804 However, while VL pretraining has shown success on multimodal tasks such as
          805 visual question answering, it is not yet known how the internal linguistic
          806 representations themselves compare to their text-only counterparts. This paper
          807 compares the semantic representations learned via VL vs. text-only pretraining
          808 for two recent VL models using a suite of analyses (clustering, probing, and
          809 performance on a commonsense question answering task) in a language-only
          810 setting. We find that the multimodal models fail to significantly outperform
          811 the text-only variants, suggesting that future work is required if multimodal
          812 pretraining is to be pursued as a means of improving NLP in general.
          813 </summary></entry><entry><id>http://arxiv.org/abs/2109.10231</id><title>SalienTrack: providing salient information for semi-automated self-tracking feedback with model explanations</title><updated>2021-09-23T09:06:49.516665+00:00</updated><author><name>Yunlong Wang</name></author><author><name>Jiaying Liu</name></author><author><name>Homin Park</name></author><author><name>Jordan Schultz-McArdle</name></author><author><name>Stephanie Rosenthal</name></author><author><name>Brian Y Lim</name></author><link href="http://arxiv.org/abs/2109.10231" rel="alternate"/><summary>Self-tracking can improve people's awareness of their unhealthy behaviors to
          814 provide insights towards behavior change. Prior work has explored how
          815 self-trackers reflect on their logged data, but it remains unclear how much
          816 they learn from the tracking feedback, and which information is more useful.
          817 Indeed, the feedback can still be overwhelming, and making it concise can
          818 improve learning by increasing focus and reducing interpretation burden. We
          819 conducted a field study of mobile food logging with two feedback modes (manual
          820 journaling and automatic annotation of food images) and identified learning
          821 differences regarding nutrition, assessment, behavioral, and contextual
          822 information. We propose a Self-Tracking Feedback Saliency Framework to define
          823 when to provide feedback, on which specific information, why those details, and
          824 how to present them (as manual inquiry or automatic feedback). We propose
          825 SalienTrack to implement these requirements. Using the data collected from the
          826 user study, we trained a machine learning model to predict whether a user would
          827 learn from each tracked event. Using explainable AI (XAI) techniques, we
          828 identified the most salient features per instance and why they lead to positive
          829 learning outcomes. We discuss implications for learnability in self-tracking,
          830 and how adding model explainability expands opportunities for improving
          831 feedback experience.
          832 </summary></entry><entry><id>http://arxiv.org/abs/2109.10217</id><title>Shape Inference and Grammar Induction for Example-based Procedural Generation</title><updated>2021-09-23T09:06:49.516292+00:00</updated><author><name>Gillis Hermans</name></author><author><name>Thomas Winters</name></author><author><name>Luc De Raedt</name></author><link href="http://arxiv.org/abs/2109.10217" rel="alternate"/><summary>Designers increasingly rely on procedural generation for automatic generation
          833 of content in various industries. These techniques require extensive knowledge
          834 of the desired content, and about how to actually implement such procedural
          835 methods. Algorithms for learning interpretable generative models from example
          836 content could alleviate both difficulties. We propose SIGI, a novel method for
          837 inferring shapes and inducing a shape grammar from grid-based 3D building
          838 examples. This interpretable grammar is well-suited for co-creative design.
          839 Applied to Minecraft buildings, we show how the shape grammar can be used to
          840 automatically generate new buildings in a similar style.
          841 </summary></entry><entry><id>http://arxiv.org/abs/2109.10200</id><title>Off-line approximate dynamic programming for the vehicle routing problem with stochastic customers and demands via decentralized decision-making</title><updated>2021-09-23T09:06:49.515928+00:00</updated><author><name>Mohsen Dastpak</name></author><author><name>Fausto Errico</name></author><link href="http://arxiv.org/abs/2109.10200" rel="alternate"/><summary>This paper studies a stochastic variant of the vehicle routing problem (VRP)
          842 where both customer locations and demands are uncertain. In particular,
          843 potential customers are not restricted to a predefined customer set but are
          844 continuously spatially distributed in a given service area. The objective is to
          845 maximize the served demands while fulfilling vehicle capacities and time
          846 restrictions. We call this problem the VRP with stochastic customers and
          847 demands (VRPSCD). For this problem, we first propose a Markov Decision Process
          848 (MDP) formulation representing the classical centralized decision-making
          849 perspective where one decision-maker establishes the routes of all vehicles.
          850 While the resulting formulation turns out to be intractable, it provides us
          851 with the ground to develop a new MDP formulation of the VRPSCD representing a
          852 decentralized decision-making framework, where vehicles autonomously establish
          853 their own routes. This new formulation allows us to develop several strategies
          854 to reduce the dimension of the state and action spaces, resulting in a
          855 considerably more tractable problem. We solve the decentralized problem via
          856 Reinforcement Learning, and in particular, we develop a Q-learning algorithm
          857 featuring state-of-the-art acceleration techniques such as Replay Memory and
          858 Double Q Network. Computational results show that our method considerably
          859 outperforms two commonly adopted benchmark policies (random and heuristic).
          860 Moreover, when comparing with existing literature, we show that our approach
          861 can compete with specialized methods developed for the particular case of the
          862 VRPSCD where customer locations and expected demands are known in advance.
          863 Finally, we show that the value functions and policies obtained by our
          864 algorithm can be easily embedded in Rollout algorithms, thus further improving
          865 their performances.
          866 </summary></entry><entry><id>http://arxiv.org/abs/2109.10199</id><title>Design and implementation of a parsimonious neuromorphic PID for onboard altitude control for MAVs using neuromorphic processors</title><updated>2021-09-23T09:06:49.515541+00:00</updated><author><name>Stein Stroobants</name></author><author><name>Julien Dupeyroux</name></author><author><name>Guido de Croon</name></author><link href="http://arxiv.org/abs/2109.10199" rel="alternate"/><summary>The great promises of neuromorphic sensing and processing for robotics have
          867 led researchers and engineers to investigate novel models for robust and
          868 reliable control of autonomous robots (navigation, obstacle detection and
          869 avoidance, etc.), especially for quadrotors in challenging contexts such as
          870 drone racing and aggressive maneuvers. Using spiking neural networks, these
          871 models can be run on neuromorphic hardware to benefit from outstanding update
          872 rates and high energy efficiency. Yet, low-level controllers are often
          873 neglected and remain outside of the neuromorphic loop. Designing low-level
          874 neuromorphic controllers is crucial to remove the standard PID, and therefore
          875 benefit from all the advantages of closing the neuromorphic loop. In this
          876 paper, we propose a parsimonious and adjustable neuromorphic PID controller,
          877 endowed with a minimal number of 93 neurons sparsely connected to achieve
          878 autonomous, onboard altitude control of a quadrotor equipped with Intel's Loihi
          879 neuromorphic chip. We successfully demonstrate the robustness of our proposed
          880 network in a set of experiments where the quadrotor is requested to reach a
          881 target altitude from take-off. Our results confirm the suitability of such
          882 low-level neuromorphic controllers, ultimately with a very high update
          883 frequency.
          884 </summary></entry><entry><id>http://arxiv.org/abs/2109.10187</id><title>Oriented Object Detection in Aerial Images Based on Area Ratio of Parallelogram</title><updated>2021-09-23T09:06:49.515064+00:00</updated><author><name>Xinyu Yu</name></author><author><name>Mi Lin</name></author><author><name>Jiangping Lu</name></author><author><name>Linlin Ou</name></author><link href="http://arxiv.org/abs/2109.10187" rel="alternate"/><summary>Rotated object detection is a challenging task in aerial images as the object
          885 in aerial images are displayed in arbitrary directions and usually densely
          886 packed. Although considerable progress has been made, there are still
          887 challenges that existing regression-based rotation detectors suffer the problem
          888 of discontinuous boundaries, which is directly caused by angular periodicity or
          889 corner ordering. In this paper, we propose a simple effective framework to
          890 address the above challenges. Instead of directly regressing the five
          891 parameters (coordinates of the central point, width, height, and rotation
          892 angle) or the four vertices, we use the area ratio of parallelogram (ARP) to
          893 accurately describe a multi-oriented object. Specifically, we regress
          894 coordinates of center point, height and width of minimum circumscribed
          895 rectangle of oriented object and three area ratios {\lambda}_1, {\lambda}_2 and
          896 {\lambda}_3. This may facilitate the offset learning and avoid the issue of
          897 angular periodicity or label points sequence for oriented objects. To further
          898 remedy the confusion issue nearly horizontal objects, we employ the area ratio
          899 between the object and its horizontal bounding box (minimum circumscribed
          900 rectangle) to guide the selection of horizontal or oriented detection for each
          901 object. We also propose a rotation efficient IoU loss (R-EIoU) to connect the
          902 horizontal bounding box with the three area ratios and improve the accurate for
          903 the rotating bounding box. Experimental results on three remote sensing
          904 datasets including HRSC2016, DOTA and UCAS-AOD and scene text including
          905 ICDAR2015 show that our method achieves superior detection performance compared
          906 with many state-of-the-art approaches. The code and model will be coming with
          907 paper published.
          908 </summary></entry><entry><id>http://arxiv.org/abs/2109.10173</id><title>Long-Term Exploration in Persistent MDPs</title><updated>2021-09-23T09:06:49.514674+00:00</updated><author><name>Leonid Ugadiarov</name></author><author><name>Alexey Skrynnik</name></author><author><name>Aleksandr I. Panov</name></author><link href="http://arxiv.org/abs/2109.10173" rel="alternate"/><summary>Exploration is an essential part of reinforcement learning, which restricts
          909 the quality of learned policy. Hard-exploration environments are defined by
          910 huge state space and sparse rewards. In such conditions, an exhaustive
          911 exploration of the environment is often impossible, and the successful training
          912 of an agent requires a lot of interaction steps. In this paper, we propose an
          913 exploration method called Rollback-Explore (RbExplore), which utilizes the
          914 concept of the persistent Markov decision process, in which agents during
          915 training can roll back to visited states. We test our algorithm in the
          916 hard-exploration Prince of Persia game, without rewards and domain knowledge.
          917 At all used levels of the game, our agent outperforms or shows comparable
          918 results with state-of-the-art curiosity methods with knowledge-based intrinsic
          919 motivation: ICM and RND. An implementation of RbExplore can be found at
          920 https://github.com/cds-mipt/RbExplore.
          921 </summary></entry><entry><id>http://arxiv.org/abs/2109.10149</id><title>Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd Ideation</title><updated>2021-09-23T09:06:49.514210+00:00</updated><author><name>Yunlong Wang</name></author><author><name>Priyadarshini Venkatesh</name></author><author><name>Brian Y. Lim</name></author><link href="http://arxiv.org/abs/2109.10149" rel="alternate"/><summary>Feedback can help crowdworkers to improve their ideations. However, current
          922 feedback methods require human assessment from facilitators or peers. This is
          923 not scalable to large crowds. We propose Interpretable Directed Diversity to
          924 automatically predict ideation quality and diversity scores, and provide AI
          925 explanations - Attribution, Contrastive Attribution, and Counterfactual
          926 Suggestions - for deeper feedback on why ideations were scored (low), and how
          927 to get higher scores. These explanations provide multi-faceted feedback as
          928 users iteratively improve their ideation. We conducted think aloud and
          929 controlled user studies to understand how various explanations are used, and
          930 evaluated whether explanations improve ideation diversity and quality. Users
          931 appreciated that explanation feedback helped focus their efforts and provided
          932 directions for improvement. This resulted in explanations improving diversity
          933 compared to no feedback or feedback with predictions only. Hence, our approach
          934 opens opportunities for explainable AI towards scalable and rich feedback for
          935 iterative crowd ideation.
          936 </summary></entry><entry><id>http://arxiv.org/abs/2109.10129</id><title>Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits</title><updated>2021-09-23T09:06:49.513806+00:00</updated><author><name>Simon Ståhlberg</name></author><author><name>Blai Bonet</name></author><author><name>Hector Geffner</name></author><link href="http://arxiv.org/abs/2109.10129" rel="alternate"/><summary>It has been recently shown that general policies for many classical planning
          937 domains can be expressed and learned in terms of a pool of features defined
          938 from the domain predicates using a description logic grammar. At the same time,
          939 most description logics correspond to a fragment of $k$-variable counting logic
          940 ($C_k$) for $k=2$, that has been shown to provide a tight characterization of
          941 the expressive power of graph neural networks. In this work, we make use of
          942 these results to understand the power and limits of using graph neural networks
          943 (GNNs) for learning optimal general policies over a number of tractable
          944 planning domains where such policies are known to exist. For this, we train a
          945 simple GNN in a supervised manner to approximate the optimal value function
          946 $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory, it is
          947 observed that general optimal policies are obtained in domains where general
          948 optimal value functions can be defined with $C_2$ features but not in those
          949 requiring more expressive $C_3$ features. In addition, it is observed that the
          950 features learned are in close correspondence with the features needed to
          951 express $V^{*}$ in closed form. The theory and the analysis of the domains let
          952 us understand the features that are actually learned as well as those that
          953 cannot be learned in this way, and let us move in a principled manner from a
          954 combinatorial optimization approach to learning general policies to a
          955 potentially, more robust and scalable approach based on deep learning.
          956 </summary></entry><entry><id>http://arxiv.org/abs/2109.10106</id><title>Distributed Mission Planning of Complex Tasks for Heterogeneous Multi-Robot Teams</title><updated>2021-09-23T09:06:49.513430+00:00</updated><author><name>Barbara Arbanas Ferreira</name></author><author><name>Tamara Petrović</name></author><author><name>Stjepan Bogdan</name></author><link href="http://arxiv.org/abs/2109.10106" rel="alternate"/><summary>In this paper, we propose a distributed multi-stage optimization method for
          957 planning complex missions for heterogeneous multi-robot teams. This class of
          958 problems involves tasks that can be executed in different ways and are
          959 associated with cross-schedule dependencies that constrain the schedules of the
          960 different robots in the system. The proposed approach involves a
          961 multi-objective heuristic search of the mission, represented as a hierarchical
          962 tree that defines the mission goal. This procedure outputs several favorable
          963 ways to fulfill the mission, which directly feed into the next stage of the
          964 method. We propose a distributed metaheuristic based on evolutionary
          965 computation to allocate tasks and generate schedules for the set of chosen
          966 decompositions. The method is evaluated in a simulation setup of an automated
          967 greenhouse use case, where we demonstrate the method's ability to adapt the
          968 planning strategy depending on the available robots and the given optimization
          969 criteria.
          970 </summary></entry><entry><id>http://arxiv.org/abs/2109.10100</id><title>A Novel Structured Natural Gradient Descent for Deep Learning</title><updated>2021-09-23T09:06:49.513082+00:00</updated><author><name>Weihua Liu</name></author><author><name>Xiabi Liu</name></author><link href="http://arxiv.org/abs/2109.10100" rel="alternate"/><summary>Natural gradient descent (NGD) provided deep insights and powerful tools to
          971 deep neural networks. However the computation of Fisher information matrix
          972 becomes more and more difficult as the network structure turns large and
          973 complex. This paper proposes a new optimization method whose main idea is to
          974 accurately replace the natural gradient optimization by reconstructing the
          975 network. More specifically, we reconstruct the structure of the deep neural
          976 network, and optimize the new network using traditional gradient descent (GD).
          977 The reconstructed network achieves the effect of the optimization way with
          978 natural gradient descent. Experimental results show that our optimization
          979 method can accelerate the convergence of deep network models and achieve better
          980 performance than GD while sharing its computational simplicity.
          981 </summary></entry><entry><id>http://arxiv.org/abs/2109.10086</id><title>SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval</title><updated>2021-09-23T09:06:49.512667+00:00</updated><author><name>Thibault Formal</name></author><author><name>Carlos Lassance</name></author><author><name>Benjamin Piwowarski</name></author><author><name>Stéphane Clinchant</name></author><link href="http://arxiv.org/abs/2109.10086" rel="alternate"/><summary>In neural Information Retrieval (IR), ongoing research is directed towards
          982 improving the first retriever in ranking pipelines. Learning dense embeddings
          983 to conduct retrieval using efficient approximate nearest neighbors methods has
          984 proven to work well. Meanwhile, there has been a growing interest in learning
          985 \emph{sparse} representations for documents and queries, that could inherit
          986 from the desirable properties of bag-of-words models such as the exact matching
          987 of terms and the efficiency of inverted indexes. Introduced recently, the
          988 SPLADE model provides highly sparse representations and competitive results
          989 with respect to state-of-the-art dense and sparse approaches. In this paper, we
          990 build on SPLADE and propose several significant improvements in terms of
          991 effectiveness and/or efficiency. More specifically, we modify the pooling
          992 mechanism, benchmark a model solely based on document expansion, and introduce
          993 models trained with distillation. We also report results on the BEIR benchmark.
          994 Overall, SPLADE is considerably improved with more than $9$\% gains on NDCG@10
          995 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
          996 </summary></entry><entry><id>http://arxiv.org/abs/2109.10085</id><title>Heterogeneous Ensemble for ESG Ratings Prediction</title><updated>2021-09-23T09:06:49.512201+00:00</updated><author><name>Tim Krappel</name></author><author><name>Alex Bogun</name></author><author><name>Damian Borth</name></author><link href="http://arxiv.org/abs/2109.10085" rel="alternate"/><summary>Over the past years, topics ranging from climate change to human rights have
          997 seen increasing importance for investment decisions. Hence, investors (asset
          998 managers and asset owners) who wanted to incorporate these issues started to
          999 assess companies based on how they handle such topics. For this assessment,
         1000 investors rely on specialized rating agencies that issue ratings along the
         1001 environmental, social and governance (ESG) dimensions. Such ratings allow them
         1002 to make investment decisions in favor of sustainability. However, rating
         1003 agencies base their analysis on subjective assessment of sustainability
         1004 reports, not provided by every company. Furthermore, due to human labor
         1005 involved, rating agencies are currently facing the challenge to scale up the
         1006 coverage in a timely manner.
         1007 </summary></entry><entry><id>http://arxiv.org/abs/2109.10065</id><title>Comparison of Neural Network based Soft Computing Techniques for Electromagnetic Modeling of a Microstrip Patch Antenna</title><updated>2021-09-23T09:06:49.511839+00:00</updated><author><name>Yuvraj Singh Malhi</name></author><author><name>Navneet Gupta</name></author><link href="http://arxiv.org/abs/2109.10065" rel="alternate"/><summary>This paper presents the comparison of various neural networks and algorithms
         1008 based on accuracy, quickness, and consistency for antenna modelling. Using
         1009 Nntool by MATLAB, 22 different combinations of networks and training algorithms
         1010 are used to predict the dimensions of a rectangular microstrip antenna using
         1011 dielectric constant, height of substrate, and frequency of operation as input.
         1012 Comparison and characterization of networks is done based on accuracy, mean
         1013 square error, and training time. Algorithms, on the other hand, are analyzed by
         1014 their accuracy, speed, reliability, and smoothness in the training process.
         1015 Finally, these results are analyzed, and recommendations are made for each
         1016 neural network and algorithm based on uses, advantages, and disadvantages. For
         1017 example, it is observed that Reduced Radial Bias network is the most accurate
         1018 network and Scaled Conjugate Gradient is the most reliable algorithm for
         1019 electromagnetic modelling. This paper will help a researcher find the optimum
         1020 network and algorithm directly without doing time-taking experimentation.
         1021 </summary></entry><entry><id>http://arxiv.org/abs/2109.10057</id><title>LOTR: Face Landmark Localization Using Localization Transformer</title><updated>2021-09-23T09:06:49.511324+00:00</updated><author><name>Ukrit Watchareeruetai</name></author><author><name>Benjaphan Sommanna</name></author><author><name>Sanjana Jain</name></author><author><name>Pavit Noinongyao</name></author><author><name>Ankush Ganguly</name></author><author><name>Aubin Samacoits</name></author><author><name>Samuel W.F. Earp</name></author><author><name>Nakarin Sritrakool</name></author><link href="http://arxiv.org/abs/2109.10057" rel="alternate"/><summary>This paper presents a novel Transformer-based facial landmark localization
         1022 network named Localization Transformer (LOTR). The proposed framework is a
         1023 direct coordinate regression approach leveraging a Transformer network to
         1024 better utilize the spatial information in the feature map. An LOTR model
         1025 consists of three main modules: 1) a visual backbone that converts an input
         1026 image into a feature map, 2) a Transformer module that improves the feature
         1027 representation from the visual backbone, and 3) a landmark prediction head that
         1028 directly predicts the landmark coordinates from the Transformer's
         1029 representation. Given cropped-and-aligned face images, the proposed LOTR can be
         1030 trained end-to-end without requiring any post-processing steps. This paper also
         1031 introduces the smooth-Wing loss function, which addresses the gradient
         1032 discontinuity of the Wing loss, leading to better convergence than standard
         1033 loss functions such as L1, L2, and Wing loss. Experimental results on the JD
         1034 landmark dataset provided by the First Grand Challenge of 106-Point Facial
         1035 Landmark Localization indicate the superiority of LOTR over the existing
         1036 methods on the leaderboard and two recent heatmap-based approaches.
         1037 </summary></entry><entry><id>http://arxiv.org/abs/2109.10047</id><title>Search For Deep Graph Neural Networks</title><updated>2021-09-23T09:06:49.510946+00:00</updated><author><name>Guosheng Feng</name></author><author><name>Chunnan Wang</name></author><author><name>Hongzhi Wang</name></author><link href="http://arxiv.org/abs/2109.10047" rel="alternate"/><summary>Current GNN-oriented NAS methods focus on the search for different layer
         1038 aggregate components with shallow and simple architectures, which are limited
         1039 by the 'over-smooth' problem. To further explore the benefits from structural
         1040 diversity and depth of GNN architectures, we propose a GNN generation pipeline
         1041 with a novel two-stage search space, which aims at automatically generating
         1042 high-performance while transferable deep GNN models in a block-wise manner.
         1043 Meanwhile, to alleviate the 'over-smooth' problem, we incorporate multiple
         1044 flexible residual connection in our search space and apply identity mapping in
         1045 the basic GNN layers. For the search algorithm, we use deep-q-learning with
         1046 epsilon-greedy exploration strategy and reward reshaping. Extensive experiments
         1047 on real-world datasets show that our generated GNN models outperforms existing
         1048 manually designed and NAS-based ones.
         1049 </summary></entry><entry><id>http://arxiv.org/abs/2109.10034</id><title>Learning offline: memory replay in biological and artificial reinforcement learning</title><updated>2021-09-23T09:06:49.510518+00:00</updated><author><name>Emma L. Roscow</name></author><author><name>Raymond Chua</name></author><author><name>Rui Ponte Costa</name></author><author><name>Matt W. Jones</name></author><author><name>Nathan Lepora</name></author><link href="http://arxiv.org/abs/2109.10034" rel="alternate"/><summary>Learning to act in an environment to maximise rewards is among the brain's
         1050 key functions. This process has often been conceptualised within the framework
         1051 of reinforcement learning, which has also gained prominence in machine learning
         1052 and artificial intelligence (AI) as a way to optimise decision-making. A common
         1053 aspect of both biological and machine reinforcement learning is the
         1054 reactivation of previously experienced episodes, referred to as replay. Replay
         1055 is important for memory consolidation in biological neural networks, and is key
         1056 to stabilising learning in deep neural networks. Here, we review recent
         1057 developments concerning the functional roles of replay in the fields of
         1058 neuroscience and AI. Complementary progress suggests how replay might support
         1059 learning processes, including generalisation and continual learning, affording
         1060 opportunities to transfer knowledge across the two fields to advance the
         1061 understanding of biological and artificial learning and memory.
         1062 </summary></entry><entry><id>http://arxiv.org/abs/2109.10020</id><title>Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks</title><updated>2021-09-23T09:06:49.510005+00:00</updated><author><name>Chin-Chia Michael Yeh</name></author><author><name>Zhongfang Zhuang</name></author><author><name>Junpeng Wang</name></author><author><name>Yan Zheng</name></author><author><name>Javid Ebrahimi</name></author><author><name>Ryan Mercer</name></author><author><name>Liang Wang</name></author><author><name>Wei Zhang</name></author><link href="http://arxiv.org/abs/2109.10020" rel="alternate"/><summary>Predicting metrics associated with entities' transnational behavior within
         1063 payment processing networks is essential for system monitoring. Multivariate
         1064 time series, aggregated from the past transaction history, can provide valuable
         1065 insights for such prediction. The general multivariate time series prediction
         1066 problem has been well studied and applied across several domains, including
         1067 manufacturing, medical, and entomology. However, new domain-related challenges
         1068 associated with the data such as concept drift and multi-modality have surfaced
         1069 in addition to the real-time requirements of handling the payment transaction
         1070 data at scale. In this work, we study the problem of multivariate time series
         1071 prediction for estimating transaction metrics associated with entities in the
         1072 payment transaction database. We propose a model with five unique components to
         1073 estimate the transaction metrics from multi-modality data. Four of these
         1074 components capture interaction, temporal, scale, and shape perspectives, and
         1075 the fifth component fuses these perspectives together. We also propose a hybrid
         1076 offline/online training scheme to address concept drift in the data and fulfill
         1077 the real-time requirements. Combining the estimation model with a graphical
         1078 user interface, the prototype transaction metric estimation system has
         1079 demonstrated its potential benefit as a tool for improving a payment processing
         1080 company's system monitoring capability.
         1081 </summary></entry><entry><id>http://arxiv.org/abs/2109.10016</id><title>CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval</title><updated>2021-09-23T09:06:49.509605+00:00</updated><author><name>Zhijian Hou</name></author><author><name>Chong-Wah Ngo</name></author><author><name>Wing Kwong Chan</name></author><link href="http://arxiv.org/abs/2109.10016" rel="alternate"/><summary>This paper tackles a recently proposed Video Corpus Moment Retrieval task.
         1082 This task is essential because advanced video retrieval applications should
         1083 enable users to retrieve a precise moment from a large video corpus. We propose
         1084 a novel CONtextual QUery-awarE Ranking~(CONQUER) model for effective moment
         1085 localization and ranking. CONQUER explores query context for multi-modal fusion
         1086 and representation learning in two different steps. The first step derives
         1087 fusion weights for the adaptive combination of multi-modal video content. The
         1088 second step performs bi-directional attention to tightly couple video and query
         1089 as a single joint representation for moment localization. As query context is
         1090 fully engaged in video representation learning, from feature fusion to
         1091 transformation, the resulting feature is user-centered and has a larger
         1092 capacity in capturing multi-modal signals specific to query. We conduct studies
         1093 on two datasets, TVR for closed-world TV episodes and DiDeMo for open-world
         1094 user-generated videos, to investigate the potential advantages of fusing video
         1095 and query online as a joint representation for moment retrieval.
         1096 </summary></entry><entry><id>http://arxiv.org/abs/2109.10011</id><title>Unsupervised Abstract Reasoning for Raven's Problem Matrices</title><updated>2021-09-23T09:06:49.509153+00:00</updated><author><name>Tao Zhuo</name></author><author><name>Qiang Huang</name></author><author><name>Mohan Kankanhalli</name></author><link href="http://arxiv.org/abs/2109.10011" rel="alternate"/><summary>Raven's Progressive Matrices (RPM) is highly correlated with human
         1097 intelligence, and it has been widely used to measure the abstract reasoning
         1098 ability of humans. In this paper, to study the abstract reasoning capability of
         1099 deep neural networks, we propose the first unsupervised learning method for
         1100 solving RPM problems. Since the ground truth labels are not allowed, we design
         1101 a pseudo target based on the prior constraints of the RPM formulation to
         1102 approximate the ground truth label, which effectively converts the unsupervised
         1103 learning strategy into a supervised one. However, the correct answer is wrongly
         1104 labelled by the pseudo target, and thus the noisy contrast will lead to
         1105 inaccurate model training. To alleviate this issue, we propose to improve the
         1106 model performance with negative answers. Moreover, we develop a
         1107 decentralization method to adapt the feature representation to different RPM
         1108 problems. Extensive experiments on three datasets demonstrate that our method
         1109 even outperforms some of the supervised approaches. Our code is available at
         1110 https://github.com/visiontao/ncd.
         1111 </summary></entry><entry><id>http://arxiv.org/abs/2109.10007</id><title>Generating Local Maps of Science using Deep Bibliographic Coupling</title><updated>2021-09-23T09:06:49.508792+00:00</updated><author><name>Gaëlle Candel</name></author><author><name>David Naccache</name></author><link href="http://arxiv.org/abs/2109.10007" rel="alternate"/><summary>Bibliographic and co-citation coupling are two analytical methods widely used
         1112 to measure the degree of similarity between scientific papers. These approaches
         1113 are intuitive, easy to put into practice, and computationally cheap. Moreover,
         1114 they have been used to generate a map of science, allowing visualizing research
         1115 field interactions. Nonetheless, these methods do not work unless two papers
         1116 share a standard reference, limiting the two papers usability with no direct
         1117 connection. In this work, we propose to extend bibliographic coupling to the
         1118 deep neighborhood, by using graph diffusion methods. This method allows
         1119 defining similarity between any two papers, making it possible to generate a
         1120 local map of science, highlighting field organization.
         1121 </summary></entry><entry><id>http://arxiv.org/abs/2109.09975</id><title>Fast nonlinear risk assessment for autonomous vehicles using learned conditional probabilistic models of agent futures</title><updated>2021-09-23T09:06:49.508363+00:00</updated><author><name>Ashkan Jasour</name></author><author><name>Xin Huang</name></author><author><name>Allen Wang</name></author><author><name>Brian C. William</name></author><link href="http://arxiv.org/abs/2109.09975" rel="alternate"/><summary>This paper presents fast non-sampling based methods to assess the risk for
         1122 trajectories of autonomous vehicles when probabilistic predictions of other
         1123 agents' futures are generated by deep neural networks (DNNs). The presented
         1124 methods address a wide range of representations for uncertain predictions
         1125 including both Gaussian and non-Gaussian mixture models to predict both agent
         1126 positions and control inputs conditioned on the scene contexts. We show that
         1127 the problem of risk assessment when Gaussian mixture models (GMMs) of agent
         1128 positions are learned can be solved rapidly to arbitrary levels of accuracy
         1129 with existing numerical methods. To address the problem of risk assessment for
         1130 non-Gaussian mixture models of agent position, we propose finding upper bounds
         1131 on risk using nonlinear Chebyshev's Inequality and sums-of-squares (SOS)
         1132 programming; they are both of interest as the former is much faster while the
         1133 latter can be arbitrarily tight. These approaches only require higher order
         1134 statistical moments of agent positions to determine upper bounds on risk. To
         1135 perform risk assessment when models are learned for agent control inputs as
         1136 opposed to positions, we propagate the moments of uncertain control inputs
         1137 through the nonlinear motion dynamics to obtain the exact moments of uncertain
         1138 position over the planning horizon. To this end, we construct deterministic
         1139 linear dynamical systems that govern the exact time evolution of the moments of
         1140 uncertain position in the presence of uncertain control inputs. The presented
         1141 methods are demonstrated on realistic predictions from DNNs trained on the
         1142 Argoverse and CARLA datasets and are shown to be effective for rapidly
         1143 assessing the probability of low probability events.
         1144 </summary></entry><entry><id>http://arxiv.org/abs/2109.09968</id><title>Generalization in Text-based Games via Hierarchical Reinforcement Learning</title><updated>2021-09-23T09:06:49.507912+00:00</updated><author><name>Yunqiu Xu</name></author><author><name>Meng Fang</name></author><author><name>Ling Chen</name></author><author><name>Yali Du</name></author><author><name>Chengqi Zhang</name></author><link href="http://arxiv.org/abs/2109.09968" rel="alternate"/><summary>Deep reinforcement learning provides a promising approach for text-based
         1145 games in studying natural language communication between humans and artificial
         1146 agents. However, the generalization still remains a big challenge as the agents
         1147 depend critically on the complexity and variety of training tasks. In this
         1148 paper, we address this problem by introducing a hierarchical framework built
         1149 upon the knowledge graph-based RL agent. In the high level, a meta-policy is
         1150 executed to decompose the whole game into a set of subtasks specified by
         1151 textual goals, and select one of them based on the KG. Then a sub-policy in the
         1152 low level is executed to conduct goal-conditioned reinforcement learning. We
         1153 carry out experiments on games with various difficulty levels and show that the
         1154 proposed method enjoys favorable generalizability.
         1155 </summary></entry><entry><id>http://arxiv.org/abs/2109.09960</id><title>Enforcing Mutual Consistency of Hard Regions for Semi-supervised Medical Image Segmentation</title><updated>2021-09-23T09:06:49.507378+00:00</updated><author><name>Yicheng Wu</name></author><author><name>Zongyuan Ge</name></author><author><name>Donghao Zhang</name></author><author><name>Minfeng Xu</name></author><author><name>Lei Zhang</name></author><author><name>Yong Xia</name></author><author><name>Jianfei Cai</name></author><link href="http://arxiv.org/abs/2109.09960" rel="alternate"/><summary>In this paper, we proposed a novel mutual consistency network (MC-Net+) to
         1156 effectively exploit the unlabeled hard regions for semi-supervised medical
         1157 image segmentation. The MC-Net+ model is motivated by the observation that deep
         1158 models trained with limited annotations are prone to output highly uncertain
         1159 and easily mis-classified predictions in the ambiguous regions (e.g. adhesive
         1160 edges or thin branches) for the image segmentation task. Leveraging these
         1161 region-level challenging samples can make the semi-supervised segmentation
         1162 model training more effective. Therefore, our proposed MC-Net+ model consists
         1163 of two new designs. First, the model contains one shared encoder and multiple
         1164 sightly different decoders (i.e. using different up-sampling strategies). The
         1165 statistical discrepancy of multiple decoders' outputs is computed to denote the
         1166 model's uncertainty, which indicates the unlabeled hard regions. Second, a new
         1167 mutual consistency constraint is enforced between one decoder's probability
         1168 output and other decoders' soft pseudo labels. In this way, we minimize the
         1169 model's uncertainty during training and force the model to generate invariant
         1170 and low-entropy results in such challenging areas of unlabeled data, in order
         1171 to learn a generalized feature representation. We compared the segmentation
         1172 results of the MC-Net+ with five state-of-the-art semi-supervised approaches on
         1173 three public medical datasets. Extension experiments with two common
         1174 semi-supervised settings demonstrate the superior performance of our model over
         1175 other existing methods, which sets a new state of the art for semi-supervised
         1176 medical image segmentation.
         1177 </summary></entry><entry><id>http://arxiv.org/abs/2109.09946</id><title>Identifying biases in legal data: An algorithmic fairness perspective</title><updated>2021-09-23T09:06:49.507009+00:00</updated><author><name>Jackson Sargent</name></author><author><name>Melanie Weber</name></author><link href="http://arxiv.org/abs/2109.09946" rel="alternate"/><summary>The need to address representation biases and sentencing disparities in legal
         1178 case data has long been recognized. Here, we study the problem of identifying
         1179 and measuring biases in large-scale legal case data from an algorithmic
         1180 fairness perspective. Our approach utilizes two regression models: A baseline
         1181 that represents the decisions of a "typical" judge as given by the data and a
         1182 "fair" judge that applies one of three fairness concepts. Comparing the
         1183 decisions of the "typical" judge and the "fair" judge allows for quantifying
         1184 biases across demographic groups, as we demonstrate in four case studies on
         1185 criminal data from Cook County (Illinois).
         1186 </summary></entry><entry><id>http://arxiv.org/abs/2109.09906</id><title>Audio Interval Retrieval using Convolutional Neural Networks</title><updated>2021-09-23T09:06:49.506567+00:00</updated><author><name>Ievgeniia Kuzminykh</name></author><author><name>Dan Shevchuk</name></author><author><name>Stavros Shiaeles</name></author><author><name>Bogdan Ghita</name></author><link href="http://arxiv.org/abs/2109.09906" rel="alternate"/><summary>Modern streaming services are increasingly labeling videos based on their
         1187 visual or audio content. This typically augments the use of technologies such
         1188 as AI and ML by allowing to use natural speech for searching by keywords and
         1189 video descriptions. Prior research has successfully provided a number of
         1190 solutions for speech to text, in the case of a human speech, but this article
         1191 aims to investigate possible solutions to retrieve sound events based on a
         1192 natural language query, and estimate how effective and accurate they are. In
         1193 this study, we specifically focus on the YamNet, AlexNet, and ResNet-50
         1194 pre-trained models to automatically classify audio samples using their
         1195 respective melspectrograms into a number of predefined classes. The predefined
         1196 classes can represent sounds associated with actions within a video fragment.
         1197 Two tests are conducted to evaluate the performance of the models on two
         1198 separate problems: audio classification and intervals retrieval based on a
         1199 natural language query. Results show that the benchmarked models are comparable
         1200 in terms of performance, with YamNet slightly outperforming the other two
         1201 models. YamNet was able to classify single fixed-size audio samples with 92.7%
         1202 accuracy and 68.75% precision while its average accuracy on intervals retrieval
         1203 was 71.62% and precision was 41.95%. The investigated method may be embedded
         1204 into an automated event marking architecture for streaming services.
         1205 </summary></entry><entry><id>http://arxiv.org/abs/2109.09904</id><title>Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems</title><updated>2021-09-23T09:06:49.506026+00:00</updated><author><name>Subbarao Kambhampati</name></author><author><name>Sarath Sreedharan</name></author><author><name>Mudit Verma</name></author><author><name>Yantian Zha</name></author><author><name>Lin Guan</name></author><link href="http://arxiv.org/abs/2109.09904" rel="alternate"/><summary>Despite the surprising power of many modern AI systems that often learn their
         1206 own representations, there is significant discontent about their inscrutability
         1207 and the attendant problems in their ability to interact with humans. While
         1208 alternatives such as neuro-symbolic approaches have been proposed, there is a
         1209 lack of consensus on what they are about. There are often two independent
         1210 motivations (i) symbols as a lingua franca for human-AI interaction and (ii)
         1211 symbols as (system-produced) abstractions use in its internal reasoning. The
         1212 jury is still out on whether AI systems will need to use symbols in their
         1213 internal reasoning to achieve general intelligence capabilities. Whatever the
         1214 answer there is, the need for (human-understandable) symbols in human-AI
         1215 interaction seems quite compelling. Symbols, like emotions, may well not be
         1216 sine qua non for intelligence per se, but they will be crucial for AI systems
         1217 to interact with us humans--as we can neither turn off our emotions nor get by
         1218 without our symbols. In particular, in many human-designed domains, humans
         1219 would be interested in providing explicit (symbolic) knowledge and advice--and
         1220 expect machine explanations in kind. This alone requires AI systems to at least
         1221 do their I/O in symbolic terms. In this blue sky paper, we argue this point of
         1222 view, and discuss research directions that need to be pursued to allow for this
         1223 type of human-AI interaction.
         1224 </summary></entry><entry><id>http://arxiv.org/abs/2109.09889</id><title>A Simple Unified Framework for Anomaly Detection in Deep Reinforcement Learning</title><updated>2021-09-23T09:06:49.505560+00:00</updated><author><name>Hongming Zhang</name></author><author><name>Ke Sun</name></author><author><name>Bo Xu</name></author><author><name>Linglong Kong</name></author><author><name>Martin Müller</name></author><link href="http://arxiv.org/abs/2109.09889" rel="alternate"/><summary>Abnormal states in deep reinforcement learning~(RL) are states that are
         1225 beyond the scope of an RL policy. Such states may make the RL system unsafe and
         1226 impede its deployment in real scenarios. In this paper, we propose a simple yet
         1227 effective anomaly detection framework for deep RL algorithms that
         1228 simultaneously considers random, adversarial and out-of-distribution~(OOD)
         1229 state outliers. In particular, we attain the class-conditional distributions
         1230 for each action class under the Gaussian assumption, and rely on these
         1231 distributions to discriminate between inliers and outliers based on Mahalanobis
         1232 Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive experiments
         1233 on Atari games that verify the effectiveness of our detection strategies. To
         1234 the best of our knowledge, we present the first in-detail study of statistical
         1235 and adversarial anomaly detection in deep RL algorithms. This simple unified
         1236 anomaly detection paves the way towards deploying safe RL systems in real-world
         1237 applications.
         1238 </summary></entry><entry><id>http://arxiv.org/abs/2109.09876</id><title>Context-Specific Representation Abstraction for Deep Option Learning</title><updated>2021-09-23T09:06:49.505061+00:00</updated><author><name>Marwa Abdulhai</name></author><author><name>Dong-Ki Kim</name></author><author><name>Matthew Riemer</name></author><author><name>Miao Liu</name></author><author><name>Gerald Tesauro</name></author><author><name>Jonathan P. How</name></author><link href="http://arxiv.org/abs/2109.09876" rel="alternate"/><summary>Hierarchical reinforcement learning has focused on discovering temporally
         1239 extended actions, such as options, that can provide benefits in problems
         1240 requiring extensive exploration. One promising approach that learns these
         1241 options end-to-end is the option-critic (OC) framework. We examine and show in
         1242 this paper that OC does not decompose a problem into simpler sub-problems, but
         1243 instead increases the size of the search over policy space with each option
         1244 considering the entire state space during learning. This issue can result in
         1245 practical limitations of this method, including sample inefficient learning. To
         1246 address this problem, we introduce Context-Specific Representation Abstraction
         1247 for Deep Option Learning (CRADOL), a new framework that considers both temporal
         1248 abstraction and context-specific representation abstraction to effectively
         1249 reduce the size of the search over policy space. Specifically, our method
         1250 learns a factored belief state representation that enables each option to learn
         1251 a policy over only a subsection of the state space. We test our method against
         1252 hierarchical, non-hierarchical, and modular recurrent neural network baselines,
         1253 demonstrating significant sample efficiency improvements in challenging
         1254 partially observable environments.
         1255 </summary></entry><entry><id>http://arxiv.org/abs/2109.09862</id><title>Language Identification with a Reciprocal Rank Classifier</title><updated>2021-09-23T09:06:49.504540+00:00</updated><author><name>Dominic Widdows</name></author><author><name>Chris Brew</name></author><link href="http://arxiv.org/abs/2109.09862" rel="alternate"/><summary>Language identification is a critical component of language processing
         1256 pipelines (Jauhiainen et al.,2019) and is not a solved problem in real-world
         1257 settings. We present a lightweight and effective language identifier that is
         1258 robust to changes of domain and to the absence of copious training data.
         1259 </summary></entry><entry><id>http://arxiv.org/abs/2109.09861</id><title>Generalized dynamic cognitive hierarchy models for strategic driving behavior</title><updated>2021-09-23T09:06:49.504112+00:00</updated><author><name>Atrisha Sarkar</name></author><author><name>Kate Larson</name></author><author><name>Krzysztof Czarnecki</name></author><link href="http://arxiv.org/abs/2109.09861" rel="alternate"/><summary>While there has been an increasing focus on the use of game theoretic models
         1260 for autonomous driving, empirical evidence shows that there are still open
         1261 questions around dealing with the challenges of common knowledge assumptions as
         1262 well as modeling bounded rationality. To address some of these practical
         1263 challenges, we develop a framework of generalized dynamic cognitive hierarchy
         1264 for both modelling naturalistic human driving behavior as well as behavior
         1265 planning for autonomous vehicles (AV). This framework is built upon a rich
         1266 model of level-0 behavior through the use of automata strategies, an
         1267 interpretable notion of bounded rationality through safety and maneuver
         1268 satisficing, and a robust response for planning. Based on evaluation on two
         1269 large naturalistic datasets as well as simulation of critical traffic
         1270 scenarios, we show that i) automata strategies are well suited for level-0
         1271 behavior in a dynamic level-k framework, and ii) the proposed robust response
         1272 to a heterogeneous population of strategic and non-strategic reasoners can be
         1273 an effective approach for game theoretic planning in AV.
         1274 </summary></entry><entry><id>http://arxiv.org/abs/2109.09844</id><title>Assessing clinical utility of Machine Learning and Artificial Intelligence approaches to analyze speech recordings in Multiple Sclerosis: A Pilot Study</title><updated>2021-09-23T09:06:49.503318+00:00</updated><author><name>Emil Svoboda</name></author><author><name>Tomáš Bořil</name></author><author><name>Jan Rusz</name></author><author><name>Tereza Tykalová</name></author><author><name>Dana Horáková</name></author><author><name>Charles R.G. Guttman</name></author><author><name>Krastan B. Blagoev</name></author><author><name>Hiroto Hatabu</name></author><author><name>Vlad I. Valtchinov</name></author><link href="http://arxiv.org/abs/2109.09844" rel="alternate"/><summary>Background: An early diagnosis together with an accurate disease progression
         1275 monitoring of multiple sclerosis is an important component of successful
         1276 disease management. Prior studies have established that multiple sclerosis is
         1277 correlated with speech discrepancies. Early research using objective acoustic
         1278 measurements has discovered measurable dysarthria.
         1279 </summary></entry><entry><id>http://arxiv.org/abs/2109.09833</id><title>Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics</title><updated>2021-09-23T09:06:49.502515+00:00</updated><author><name>Yixin Wu</name></author><author><name>Rui Luo</name></author><author><name>Chen Zhang</name></author><author><name>Jun Wang</name></author><author><name>Yaodong Yang</name></author><link href="http://arxiv.org/abs/2109.09833" rel="alternate"/><summary>In this paper, we characterize the noise of stochastic gradients and analyze
         1280 the noise-induced dynamics during training deep neural networks by
         1281 gradient-based optimizers. Specifically, we firstly show that the stochastic
         1282 gradient noise possesses finite variance, and therefore the classical Central
         1283 Limit Theorem (CLT) applies; this indicates that the gradient noise is
         1284 asymptotically Gaussian. Such an asymptotic result validates the wide-accepted
         1285 assumption of Gaussian noise. We clarify that the recently observed phenomenon
         1286 of heavy tails within gradient noise may not be intrinsic properties, but the
         1287 consequence of insufficient mini-batch size; the gradient noise, which is a sum
         1288 of limited i.i.d. random variables, has not reached the asymptotic regime of
         1289 CLT, thus deviates from Gaussian. We quantitatively measure the goodness of
         1290 Gaussian approximation of the noise, which supports our conclusion. Secondly,
         1291 we analyze the noise-induced dynamics of stochastic gradient descent using the
         1292 Langevin equation, granting for momentum hyperparameter in the optimizer with a
         1293 physical interpretation. We then proceed to demonstrate the existence of the
         1294 steady-state distribution of stochastic gradient descent and approximate the
         1295 distribution at a small learning rate.
         1296 </summary></entry><entry><id>http://arxiv.org/abs/2109.09829</id><title>Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework</title><updated>2021-09-23T09:06:49.502052+00:00</updated><author><name>Muhammad Shafique</name></author><author><name>Alberto Marchisio</name></author><author><name>Rachmad Vidya Wicaksana Putra</name></author><author><name>Muhammad Abdullah Hanif</name></author><link href="http://arxiv.org/abs/2109.09829" rel="alternate"/><summary>The security and privacy concerns along with the amount of data that is
         1297 required to be processed on regular basis has pushed processing to the edge of
         1298 the computing systems. Deploying advanced Neural Networks (NN), such as deep
         1299 neural networks (DNNs) and spiking neural networks (SNNs), that offer
         1300 state-of-the-art results on resource-constrained edge devices is challenging
         1301 due to the stringent memory and power/energy constraints. Moreover, these
         1302 systems are required to maintain correct functionality under diverse security
         1303 and reliability threats. This paper first discusses existing approaches to
         1304 address energy efficiency, reliability, and security issues at different system
         1305 layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how to
         1306 further improve the performance (latency) and the energy efficiency of Edge AI
         1307 systems through HW/SW-level optimizations, such as pruning, quantization, and
         1308 approximation. To address reliability threats (like permanent and transient
         1309 faults), we highlight cost-effective mitigation techniques, like fault-aware
         1310 training and mapping. Moreover, we briefly discuss effective detection and
         1311 protection techniques to address security threats (like model and data
         1312 corruption). Towards the end, we discuss how these techniques can be combined
         1313 in an integrated cross-layer framework for realizing robust and
         1314 energy-efficient Edge AI systems.
         1315 </summary></entry><entry><id>http://arxiv.org/abs/2109.09825</id><title>Data Augmentation Methods for Anaphoric Zero Pronouns</title><updated>2021-09-23T09:06:49.501641+00:00</updated><author><name>Abdulrahman Aloraini</name></author><author><name>Massimo Poesio</name></author><link href="http://arxiv.org/abs/2109.09825" rel="alternate"/><summary>In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and
         1316 many others, unrealized (null) arguments in certain syntactic positions can
         1317 refer to a previously introduced entity, and are thus called anaphoric zero
         1318 pronouns. The existing resources for studying anaphoric zero pronoun
         1319 interpretation are however still limited. In this paper, we use five data
         1320 augmentation methods to generate and detect anaphoric zero pronouns
         1321 automatically. We use the augmented data as additional training materials for
         1322 two anaphoric zero pronoun systems for Arabic. Our experimental results show
         1323 that data augmentation improves the performance of the two systems, surpassing
         1324 the state-of-the-art results.
         1325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09809</id><title>Counterfactual Instances Explain Little</title><updated>2021-09-23T09:06:49.501241+00:00</updated><author><name>Adam White</name></author><author><name>Artur d'Avila Garcez</name></author><link href="http://arxiv.org/abs/2109.09809" rel="alternate"/><summary>In many applications, it is important to be able to explain the decisions of
         1326 machine learning systems. An increasingly popular approach has been to seek to
         1327 provide \emph{counterfactual instance explanations}. These specify close
         1328 possible worlds in which, contrary to the facts, a person receives their
         1329 desired decision from the machine learning system. This paper will draw on
         1330 literature from the philosophy of science to argue that a satisfactory
         1331 explanation must consist of both counterfactual instances and a causal equation
         1332 (or system of equations) that support the counterfactual instances. We will
         1333 show that counterfactual instances by themselves explain little. We will
         1334 further illustrate how explainable AI methods that provide both causal
         1335 equations and counterfactual instances can successfully explain machine
         1336 learning predictions.
         1337 </summary></entry><entry><id>http://arxiv.org/abs/2109.09807</id><title>I Know You Can't See Me: Dynamic Occlusion-Aware Safety Validation of Strategic Planners for Autonomous Vehicles Using Hypergames</title><updated>2021-09-23T09:06:49.500759+00:00</updated><author><name>Maximilian Kahn</name></author><author><name>Atrisha Sarkar</name></author><author><name>Krzysztof Czarnecki</name></author><link href="http://arxiv.org/abs/2109.09807" rel="alternate"/><summary>A particular challenge for both autonomous and human driving is dealing with
         1338 risk associated with dynamic occlusion, i.e., occlusion caused by other
         1339 vehicles in traffic. Based on the theory of hypergames, we develop a novel
         1340 multi-agent dynamic occlusion risk (DOR) measure for assessing situational risk
         1341 in dynamic occlusion scenarios. Furthermore, we present a white-box,
         1342 scenario-based, accelerated safety validation framework for assessing safety of
         1343 strategic planners in AV. Based on evaluation over a large naturalistic
         1344 database, our proposed validation method achieves a 4000% speedup compared to
         1345 direct validation on naturalistic data, a more diverse coverage, and ability to
         1346 generalize beyond the dataset and generate commonly observed dynamic occlusion
         1347 crashes in traffic in an automated manner.
         1348 </summary></entry><entry><id>http://arxiv.org/abs/2109.09791</id><title>Prediction of severe thunderstorm events with ensemble deep learning and radar data</title><updated>2021-09-23T09:06:49.499746+00:00</updated><author><name>Sabrina Guastavino</name></author><author><name>Michele Piana</name></author><author><name>Marco Tizzi</name></author><author><name>Federico Cassola</name></author><author><name>Antonio Iengo</name></author><author><name>Davide Sacchetti</name></author><author><name>Enrico Solazzo</name></author><author><name>Federico Benvenuto</name></author><link href="http://arxiv.org/abs/2109.09791" rel="alternate"/><summary>The problem of nowcasting extreme weather events can be addressed by applying
         1349 either numerical methods for the solution of dynamic model equations or
         1350 data-driven artificial intelligence algorithms. Within this latter framework,
         1351 the present paper illustrates how a deep learning method, exploiting videos of
         1352 radar reflectivity frames as input, can be used to realize a warning machine
         1353 able to sound timely alarms of possible severe thunderstorm events. From a
         1354 technical viewpoint, the computational core of this approach is the use of a
         1355 value-weighted skill score for both transforming the probabilistic outcomes of
         1356 the deep neural network into binary classification and assessing the
         1357 forecasting performances. The warning machine has been validated against
         1358 weather radar data recorded in the Liguria region, in Italy,
         1359