python_feedgen09_jnboehm.com.atom.xml - sfeed_tests - sfeed tests and RSS and Atom files
(HTM) git clone git://git.codemadness.org/sfeed_tests
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
python_feedgen09_jnboehm.com.atom.xml (142453B)
---
1 <?xml version='1.0' encoding='UTF-8'?>
2 <feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><id>http://arxiv.org/</id><title>arxiv parsed</title><updated>2021-09-23T09:06:47.938938+00:00</updated><author><name>Jan Niklas Böhm</name><email>jan-niklas.boehm@uni-tuebingen.de</email></author><link href="http://arxiv.org" rel="alternate"/><link href="https://jnboehm.com" rel="self"/><generator uri="https://lkiesow.github.io/python-feedgen" version="0.9.0">python-feedgen</generator><subtitle>This parses the arxiv feed and filters interesting (to me) articles!</subtitle><entry><id>http://arxiv.org/abs/2109.09705</id><title>Neural forecasting at scale (update)</title><updated>2021-09-23T09:06:49.539266+00:00</updated><author><name>Philippe Chatigny</name></author><author><name>Shengrui Wang Jean-Marc Patenaude</name></author><author><name>Boris N. Oreshkin</name></author><link href="http://arxiv.org/abs/2109.09705" rel="alternate"/><summary>We study the problem of efficiently scaling ensemble-based deep neural
3 networks for time series (TS) forecasting on a large set of time series.
4 Current state-of-the-art deep ensemble models have high memory and
5 computational requirements, hampering their use to forecast millions of TS in
6 practical scenarios. We propose N-BEATS(P), a global multivariate variant of
7 the N-BEATS model designed to allow simultaneous training of multiple
8 univariate TS forecasting models. Our model addresses the practical limitations
9 of related models, reducing the training time by half and memory requirement by
10 a factor of 5, while keeping the same level of accuracy. We have performed
11 multiple experiments detailing the various ways to train our model and have
12 obtained results that demonstrate its capacity to support zero-shot TS
13 forecasting, i.e., to train a neural network on a source TS dataset and deploy
14 it on a different target TS dataset without retraining, which provides an
15 efficient and reliable solution to forecast at scale even in difficult
16 forecasting conditions.
17 </summary></entry><entry><id>http://arxiv.org/abs/2109.02624</id><title>Functional additive regression on shape and form manifolds of planar curves (update)</title><updated>2021-09-23T09:06:49.538917+00:00</updated><author><name>Almond Stöcker</name></author><author><name>Sonja Greven</name></author><link href="http://arxiv.org/abs/2109.02624" rel="alternate"/><summary>Defining shape and form as equivalence classes under translation, rotation
18 and -- for shapes -- also scale, we extend generalized additive regression to
19 models for the shape/form of planar curves or landmark configurations. The
20 model respects the resulting quotient geometry of the response, employing the
21 squared geodesic distance as loss function and a geodesic response function
22 mapping the additive predictor to the shape/form space. For fitting the model,
23 we propose a Riemannian $L_2$-Boosting algorithm well-suited for a potentially
24 large number of possibly parameter-intensive model terms, which also yiels
25 automated model selection. We provide novel intuitively interpretable
26 visualizations for (even non-linear) covariate effects in the shape/form space
27 via suitable tensor based factorizations. The usefulness of the proposed
28 framework is illustrated in an analysis of 1) astragalus shapes of wild and
29 domesticated sheep and 2) cell forms generated in a biophysical model, as well
30 as 3) in a realistic simulation study with response shapes and forms motivated
31 from a dataset on bottle outlines.
32 </summary></entry><entry><id>http://arxiv.org/abs/2107.04136</id><title>Diagonal Nonlinear Transformations Preserve Structure in Covariance and Precision Matrices (update)</title><updated>2021-09-23T09:06:49.538501+00:00</updated><author><name>Rebecca E Morrison</name></author><author><name>Ricardo Baptista</name></author><author><name>Estelle L Basor</name></author><link href="http://arxiv.org/abs/2107.04136" rel="alternate"/><summary>For a multivariate normal distribution, the sparsity of the covariance and
33 precision matrices encodes complete information about independence and
34 conditional independence properties. For general distributions, the covariance
35 and precision matrices reveal correlations and so-called partial correlations
36 between variables, but these do not, in general, have any correspondence with
37 respect to independence properties. In this paper, we prove that, for a certain
38 class of non-Gaussian distributions, these correspondences still hold, exactly
39 for the covariance and approximately for the precision. The distributions --
40 sometimes referred to as "nonparanormal" -- are given by diagonal
41 transformations of multivariate normal random variables. We provide several
42 analytic and numerical examples illustrating these results.
43 </summary></entry><entry><id>http://arxiv.org/abs/2106.09370</id><title>A deep generative model for probabilistic energy forecasting in power systems: normalizing flows (update)</title><updated>2021-09-23T09:06:49.538071+00:00</updated><author><name>Jonathan Dumas</name></author><author><name>Antoine Wehenkel Damien Lanaspeze</name></author><author><name>Bertrand Cornélusse</name></author><author><name>Antonio Sutera</name></author><link href="http://arxiv.org/abs/2106.09370" rel="alternate"/><summary>Greater direct electrification of end-use sectors with a higher share of
44 renewables is one of the pillars to power a carbon-neutral society by 2050.
45 However, in contrast to conventional power plants, renewable energy is subject
46 to uncertainty raising challenges for their interaction with power systems.
47 Scenario-based probabilistic forecasting models have become a vital tool to
48 equip decision-makers. This paper presents to the power systems forecasting
49 practitioners a recent deep learning technique, the normalizing flows, to
50 produce accurate scenario-based probabilistic forecasts that are crucial to
51 face the new challenges in power systems applications. The strength of this
52 technique is to directly learn the stochastic multivariate distribution of the
53 underlying process by maximizing the likelihood. Through comprehensive
54 empirical evaluations using the open data of the Global Energy Forecasting
55 Competition 2014, we demonstrate that this methodology is competitive with
56 other state-of-the-art deep learning generative models: generative adversarial
57 networks and variational autoencoders. The models producing weather-based wind,
58 solar power, and load scenarios are properly compared in terms of forecast
59 value by considering the case study of an energy retailer and quality using
60 several complementary metrics. The numerical experiments are simple and easily
61 reproducible. Thus, we hope it will encourage other forecasting practitioners
62 to test and use normalizing flows in power system applications such as bidding
63 on electricity markets, scheduling power systems with high renewable energy
64 sources penetration, energy management of virtual power plan or microgrids, and
65 unit commitment.
66 </summary></entry><entry><id>http://arxiv.org/abs/2105.14367</id><title>Deconvolutional Density Network: Modeling Free-Form Conditional Distributions (update)</title><updated>2021-09-23T09:06:49.537668+00:00</updated><author><name>Bing Chen</name></author><author><name>Mazharul Islam</name></author><author><name>Jisuo Gao</name></author><author><name>Lin Wang</name></author><link href="http://arxiv.org/abs/2105.14367" rel="alternate"/><summary>Conditional density estimation (CDE) is the task of estimating the
67 probability of an event conditioned on some inputs. A neural network (NN) can
68 be used to compute the output distribution for continuous-domain, but it is
69 difficult to explicitly approximate a free-form one without knowing the
70 information of its general form a priori. In order to fit an arbitrary
71 conditional distribution, discretizing the continuous domain into bins is an
72 effective strategy, as long as we have sufficiently narrow bins and very large
73 data. However, collecting enough data is often hard to reach and falls far
74 short of that ideal in many circumstances, especially in multivariate CDE for
75 the curse of dimensionality. In this paper, we demonstrate the benefits of
76 modeling free-form conditional distributions using a deconvolution-based neural
77 net framework, coping with data deficiency problems in discretization. It has
78 the advantage of being flexible but also takes advantage of the hierarchical
79 smoothness offered by the deconvolution layers. We compare our method to a
80 number of other density-estimation approaches and show that our Deconvolutional
81 Density Network (DDN) outperforms the competing methods on many univariate and
82 multivariate tasks.
83 </summary></entry><entry><id>http://arxiv.org/abs/2102.07767</id><title>Communication-efficient Distributed Cooperative Learning with Compressed Beliefs (update)</title><updated>2021-09-23T09:06:49.537320+00:00</updated><author><name>Mohammad Taha Toghani</name></author><author><name>César A. Uribe</name></author><link href="http://arxiv.org/abs/2102.07767" rel="alternate"/><summary>We study the problem of distributed cooperative learning, where a group of
84 agents seeks to agree on a set of hypotheses that best describes a sequence of
85 private observations. In the scenario where the set of hypotheses is large, we
86 propose a belief update rule where agents share compressed (either sparse or
87 quantized) beliefs with an arbitrary positive compression rate. Our algorithm
88 leverages a unified communication rule that enables agents to access
89 wide-ranging compression operators as black-box modules. We prove the almost
90 sure asymptotic exponential convergence of beliefs around the set of optimal
91 hypotheses. Additionally, we show a non-asymptotic, explicit, and linear
92 concentration rate in probability of the beliefs on the optimal hypothesis set.
93 We provide numerical experiments to illustrate the communication benefits of
94 our method. The simulation results show that the number of transmitted bits can
95 be reduced to 5-10% of the non-compressed method in the studied scenarios.
96 </summary></entry><entry><id>http://arxiv.org/abs/2012.15059</id><title>Ensembles of Localised Models for Time Series Forecasting (update)</title><updated>2021-09-23T09:06:49.536891+00:00</updated><author><name>Rakshitha Godahewa</name></author><author><name>Kasun Bandara</name></author><author><name>Geoffrey I. Webb</name></author><author><name>Slawek Smyl</name></author><author><name>Christoph Bergmeir</name></author><link href="http://arxiv.org/abs/2012.15059" rel="alternate"/><summary>With large quantities of data typically available nowadays, forecasting
97 models that are trained across sets of time series, known as Global Forecasting
98 Models (GFM), are regularly outperforming traditional univariate forecasting
99 models that work on isolated series. As GFMs usually share the same set of
100 parameters across all time series, they often have the problem of not being
101 localised enough to a particular series, especially in situations where
102 datasets are heterogeneous. We study how ensembling techniques can be used with
103 generic GFMs and univariate models to solve this issue. Our work systematises
104 and compares relevant current approaches, namely clustering series and training
105 separate submodels per cluster, the so-called ensemble of specialists approach,
106 and building heterogeneous ensembles of global and local models. We fill some
107 gaps in the existing GFM localisation approaches, in particular by
108 incorporating varied clustering techniques such as feature-based clustering,
109 distance-based clustering and random clustering, and generalise them to use
110 different underlying GFM model types. We then propose a new methodology of
111 clustered ensembles where we train multiple GFMs on different clusters of
112 series, obtained by changing the number of clusters and cluster seeds. Using
113 Feed-forward Neural Networks, Recurrent Neural Networks, and Pooled Regression
114 models as the underlying GFMs, in our evaluation on eight publicly available
115 datasets, the proposed models are able to achieve significantly higher accuracy
116 than baseline GFM models and univariate forecasting methods.
117 </summary></entry><entry><id>http://arxiv.org/abs/2009.13267</id><title>Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models (update)</title><updated>2021-09-23T09:06:49.536440+00:00</updated><author><name>Sumanta Bhattacharyya</name></author><author><name>Amirmohammad Rooshenas</name></author><author><name>Subhajit Naskar</name></author><author><name>Simeng Sun</name></author><author><name>Mohit Iyyer</name></author><author><name>Andrew McCallum</name></author><link href="http://arxiv.org/abs/2009.13267" rel="alternate"/><summary>The discrepancy between maximum likelihood estimation (MLE) and task measures
118 such as BLEU score has been studied before for autoregressive neural machine
119 translation (NMT) and resulted in alternative training algorithms (Ranzato et
120 al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However,
121 MLE training remains the de facto approach for autoregressive NMT because of
122 its computational efficiency and stability. Despite this mismatch between the
123 training objective and task measure, we notice that the samples drawn from an
124 MLE-based trained NMT support the desired distribution -- there are samples
125 with much higher BLEU score comparing to the beam decoding output. To benefit
126 from this observation, we train an energy-based model to mimic the behavior of
127 the task measure (i.e., the energy-based model assigns lower energy to samples
128 with higher BLEU score), which is resulted in a re-ranking algorithm based on
129 the samples drawn from NMT: energy-based re-ranking (EBR). We use both marginal
130 energy models (over target sentence) and joint energy models (over both source
131 and target sentences). Our EBR with the joint energy model consistently
132 improves the performance of the Transformer-based NMT: +4 BLEU points on
133 IWSLT'14 German-English, +3.0 BELU points on Sinhala-English, +1.2 BLEU on
134 WMT'16 English-German tasks.
135 </summary></entry><entry><id>http://arxiv.org/abs/2005.11079</id><title>Graph Random Neural Network for Semi-Supervised Learning on Graphs (update)</title><updated>2021-09-23T09:06:49.535864+00:00</updated><author><name>Wenzheng Feng</name></author><author><name>Jie Zhang</name></author><author><name>Yuxiao Dong</name></author><author><name>Yu Han</name></author><author><name>Huanbo Luan</name></author><author><name>Qian Xu</name></author><author><name>Qiang Yang</name></author><author><name>Evgeny Kharlamov</name></author><author><name>Jie Tang</name></author><link href="http://arxiv.org/abs/2005.11079" rel="alternate"/><summary>We study the problem of semi-supervised learning on graphs, for which graph
136 neural networks (GNNs) have been extensively explored. However, most existing
137 GNNs inherently suffer from the limitations of over-smoothing, non-robustness,
138 and weak-generalization when labeled nodes are scarce. In this paper, we
139 propose a simple yet effective framework -- GRAPH RANDOM NEURAL NETWORKS
140 (GRAND) -- to address these issues. In GRAND, we first design a random
141 propagation strategy to perform graph data augmentation. Then we leverage
142 consistency regularization to optimize the prediction consistency of unlabeled
143 nodes across different data augmentations. Extensive experiments on graph
144 benchmark datasets suggest that GRAND significantly outperforms
145 state-of-the-art GNN baselines on semi-supervised node classification. Finally,
146 we show that GRAND mitigates the issues of over-smoothing and non-robustness,
147 exhibiting better generalization behavior than existing GNNs. The source code
148 of GRAND is publicly available at https://github.com/Grand20/grand.
149 </summary></entry><entry><id>http://arxiv.org/abs/2004.14427</id><title>Whittle index based Q-learning for restless bandits with average reward (update)</title><updated>2021-09-23T09:06:49.535532+00:00</updated><author><name>Konstantin E. Avrachenkov</name></author><author><name>Vivek S. Borkar</name></author><link href="http://arxiv.org/abs/2004.14427" rel="alternate"/><summary>A novel reinforcement learning algorithm is introduced for multiarmed
150 restless bandits with average reward, using the paradigms of Q-learning and
151 Whittle index. Specifically, we leverage the structure of the Whittle index
152 policy to reduce the search space of Q-learning, resulting in major
153 computational gains. Rigorous convergence analysis is provided, supported by
154 numerical experiments. The numerical experiments show excellent empirical
155 performance of the proposed scheme.
156 </summary></entry><entry><id>http://arxiv.org/abs/2003.05738</id><title>IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control (update)</title><updated>2021-09-23T09:06:49.535152+00:00</updated><author><name>François-Xavier Devailly</name></author><author><name>Denis Larocque</name></author><author><name>Laurent Charlin</name></author><link href="http://arxiv.org/abs/2003.05738" rel="alternate"/><summary>Scaling adaptive traffic-signal control involves dealing with combinatorial
157 state and action spaces. Multi-agent reinforcement learning attempts to address
158 this challenge by distributing control to specialized agents. However,
159 specialization hinders generalization and transferability, and the
160 computational graphs underlying neural-networks architectures -- dominating in
161 the multi-agent setting -- do not offer the flexibility to handle an arbitrary
162 number of entities which changes both between road networks, and over time as
163 vehicles traverse the network. We introduce Inductive Graph Reinforcement
164 Learning (IG-RL) based on graph-convolutional networks which adapts to the
165 structure of any road network, to learn detailed representations of
166 traffic-controllers and their surroundings. Our decentralized approach enables
167 learning of a transferable-adaptive-traffic-signal-control policy. After being
168 trained on an arbitrary set of road networks, our model can generalize to new
169 road networks, traffic distributions, and traffic regimes, with no additional
170 training and a constant number of parameters, enabling greater scalability
171 compared to prior methods. Furthermore, our approach can exploit the
172 granularity of available data by capturing the (dynamic) demand at both the
173 lane and the vehicle levels. The proposed method is tested on both road
174 networks and traffic settings never experienced during training. We compare
175 IG-RL to multi-agent reinforcement learning and domain-specific baselines. In
176 both synthetic road networks and in a larger experiment involving the control
177 of the 3,971 traffic signals of Manhattan, we show that different
178 instantiations of IG-RL outperform baselines.
179 </summary></entry><entry><id>http://arxiv.org/abs/1905.10029</id><title>Power up! Robust Graph Convolutional Network via Graph Powering (update)</title><updated>2021-09-23T09:06:49.534750+00:00</updated><author><name>Ming Jin</name></author><author><name>Heng Chang</name></author><author><name>Wenwu Zhu</name></author><author><name>Somayeh Sojoudi</name></author><link href="http://arxiv.org/abs/1905.10029" rel="alternate"/><summary>Graph convolutional networks (GCNs) are powerful tools for graph-structured
180 data. However, they have been recently shown to be vulnerable to topological
181 attacks. To enhance adversarial robustness, we go beyond spectral graph theory
182 to robust graph theory. By challenging the classical graph Laplacian, we
183 propose a new convolution operator that is provably robust in the spectral
184 domain and is incorporated in the GCN architecture to improve expressivity and
185 interpretability. By extending the original graph to a sequence of graphs, we
186 also propose a robust training paradigm that encourages transferability across
187 graphs that span a range of spatial and spectral characteristics. The proposed
188 approaches are demonstrated in extensive experiments to simultaneously improve
189 performance in both benign and adversarial situations.
190 </summary></entry><entry><id>http://arxiv.org/abs/2109.10319</id><title>Consistency of spectral clustering for directed network community detection</title><updated>2021-09-23T09:06:49.534381+00:00</updated><author><name>Huan Qing</name></author><author><name>Jingli Wang</name></author><link href="http://arxiv.org/abs/2109.10319" rel="alternate"/><summary>Directed networks appear in various areas, such as biology, sociology,
191 physiology and computer science. However, at present, most network analysis
192 ignores the direction. In this paper, we construct a spectral clustering method
193 based on the singular decomposition of the adjacency matrix to detect community
194 in directed stochastic block model (DiSBM). By considering a sparsity
195 parameter, under some mild conditions, we show the proposed approach can
196 consistently recover hidden row and column communities for different scaling of
197 degrees.
198 </summary></entry><entry><id>http://arxiv.org/abs/2109.10298</id><title>Assured Neural Network Architectures for Control and Identification of Nonlinear Systems</title><updated>2021-09-23T09:06:49.534036+00:00</updated><author><name>James Ferlez</name></author><author><name>Yasser Shoukry</name></author><link href="http://arxiv.org/abs/2109.10298" rel="alternate"/><summary>In this paper, we consider the problem of automatically designing a Rectified
199 Linear Unit (ReLU) Neural Network (NN) architecture (number of layers and
200 number of neurons per layer) with the assurance that it is sufficiently
201 parametrized to control a nonlinear system; i.e. control the system to satisfy
202 a given formal specification. This is unlike current techniques, which provide
203 no assurances on the resultant architecture. Moreover, our approach requires
204 only limited knowledge of the underlying nonlinear system and specification. We
205 assume only that the specification can be satisfied by a Lipschitz-continuous
206 controller with a known bound on its Lipschitz constant; the specific
207 controller need not be known. From this assumption, we bound the number of
208 affine functions needed to construct a Continuous Piecewise Affine (CPWA)
209 function that can approximate any Lipschitz-continuous controller that
210 satisfies the specification. Then we connect this CPWA to a NN architecture
211 using the authors' recent results on the Two-Level Lattice (TLL) NN
212 architecture; the TLL architecture was shown to be parameterized by the number
213 of affine functions present in the CPWA function it realizes.
214 </summary></entry><entry><id>http://arxiv.org/abs/2109.10279</id><title>Multiblock-Networks: A Neural Network Analog to Component Based Methods for Multi-Source Data</title><updated>2021-09-23T09:06:49.533577+00:00</updated><author><name>Anna Jenul</name></author><author><name>Stefan Schrunner</name></author><author><name>Runar Helin</name></author><author><name>Kristian Hovde Liland</name></author><author><name>Cecilia Marie Futsæther</name></author><author><name>Oliver Tomic</name></author><link href="http://arxiv.org/abs/2109.10279" rel="alternate"/><summary>Training predictive models on datasets from multiple sources is a common, yet
215 challenging setup in applied machine learning. Even though model interpretation
216 has attracted more attention in recent years, many modeling approaches still
217 focus mainly on performance. To further improve the interpretability of machine
218 learning models, we suggest the adoption of concepts and tools from the
219 well-established framework of component based multiblock analysis, also known
220 as chemometrics. Nevertheless, artificial neural networks provide greater
221 flexibility in model architecture and thus, often deliver superior predictive
222 performance. In this study, we propose a setup to transfer the concepts of
223 component based statistical models, including multiblock variants of principal
224 component regression and partial least squares regression, to neural network
225 architectures. Thereby, we combine the flexibility of neural networks with the
226 concepts for interpreting block relevance in multiblock methods. In two use
227 cases we demonstrate how the concept can be implemented in practice, and
228 compare it to both common feed-forward neural networks without blocks, as well
229 as statistical component based multiblock methods. Our results underline that
230 multiblock networks allow for basic model interpretation while matching the
231 performance of ordinary feed-forward neural networks.
232 </summary></entry><entry><id>http://arxiv.org/abs/2109.10262</id><title>Generalized Optimization: A First Step Towards Category Theoretic Learning Theory</title><updated>2021-09-23T09:06:49.533249+00:00</updated><author><name>Dan Shiebler</name></author><link href="http://arxiv.org/abs/2109.10262" rel="alternate"/><summary>The Cartesian reverse derivative is a categorical generalization of
233 reverse-mode automatic differentiation. We use this operator to generalize
234 several optimization algorithms, including a straightforward generalization of
235 gradient descent and a novel generalization of Newton's method. We then explore
236 which properties of these algorithms are preserved in this generalized setting.
237 First, we show that the transformation invariances of these algorithms are
238 preserved: while generalized Newton's method is invariant to all invertible
239 linear transformations, generalized gradient descent is invariant only to
240 orthogonal linear transformations. Next, we show that we can express the change
241 in loss of generalized gradient descent with an inner product-like expression,
242 thereby generalizing the non-increasing and convergence properties of the
243 gradient descent optimization flow. Finally, we include several numerical
244 experiments to illustrate the ideas in the paper and demonstrate how we can use
245 them to optimize polynomial functions over an ordered ring.
246 </summary></entry><entry><id>http://arxiv.org/abs/2109.10254</id><title>Uncertainty Toolbox: an Open-Source Library for Assessing, Visualizing, and Improving Uncertainty Quantification</title><updated>2021-09-23T09:06:49.532048+00:00</updated><author><name>Youngseog Chung</name></author><author><name>Ian Char</name></author><author><name>Han Guo</name></author><author><name>Jeff Schneider</name></author><author><name>Willie Neiswanger</name></author><link href="http://arxiv.org/abs/2109.10254" rel="alternate"/><summary>With increasing deployment of machine learning systems in various real-world
247 tasks, there is a greater need for accurate quantification of predictive
248 uncertainty. While the common goal in uncertainty quantification (UQ) in
249 machine learning is to approximate the true distribution of the target data,
250 many works in UQ tend to be disjoint in the evaluation metrics utilized, and
251 disparate implementations for each metric lead to numerical results that are
252 not directly comparable across different works. To address this, we introduce
253 Uncertainty Toolbox, an open-source python library that helps to assess,
254 visualize, and improve UQ. Uncertainty Toolbox additionally provides
255 pedagogical resources, such as a glossary of key terms and an organized
256 collection of key paper references. We hope that this toolbox is useful for
257 accelerating and uniting research efforts in uncertainty in machine learning.
258 </summary></entry><entry><id>http://arxiv.org/abs/2109.10219</id><title>Adaptive Reliability Analysis for Multi-fidelity Models using a Collective Learning Strategy</title><updated>2021-09-23T09:06:49.531656+00:00</updated><author><name>Chi Zhang</name></author><author><name>Chaolin Song</name></author><author><name>Abdollah Shafieezadeh</name></author><link href="http://arxiv.org/abs/2109.10219" rel="alternate"/><summary>In many fields of science and engineering, models with different fidelities
259 are available. Physical experiments or detailed simulations that accurately
260 capture the behavior of the system are regarded as high-fidelity models with
261 low model uncertainty, however, they are expensive to run. On the other hand,
262 simplified physical experiments or numerical models are seen as low-fidelity
263 models that are cheaper to evaluate. Although low-fidelity models are often not
264 suitable for direct use in reliability analysis due to their low accuracy, they
265 can offer information about the trend of the high-fidelity model thus providing
266 the opportunity to explore the design space at a low cost. This study presents
267 a new approach called adaptive multi-fidelity Gaussian process for reliability
268 analysis (AMGPRA). Contrary to selecting training points and information
269 sources in two separate stages as done in state-of-the-art mfEGRA method, the
270 proposed approach finds the optimal training point and information source
271 simultaneously using the novel collective learning function (CLF). CLF is able
272 to assess the global impact of a candidate training point from an information
273 source and it accommodates any learning function that satisfies a certain
274 profile. In this context, CLF provides a new direction for quantifying the
275 impact of new training points and can be easily extended with new learning
276 functions to adapt to different reliability problems. The performance of the
277 proposed method is demonstrated by three mathematical examples and one
278 engineering problem concerning the wind reliability of transmission towers. It
279 is shown that the proposed method achieves similar or higher accuracy with
280 reduced computational costs compared to state-of-the-art single and
281 multi-fidelity methods. A key application of AMGPRA is high-fidelity fragility
282 modeling using complex and costly physics-based computational models.
283 </summary></entry><entry><id>http://arxiv.org/abs/2109.10162</id><title>Learning low-degree functions from a logarithmic number of random queries</title><updated>2021-09-23T09:06:49.531322+00:00</updated><author><name>Alexandros Eskenazis</name></author><author><name>Paata Ivanisvili</name></author><link href="http://arxiv.org/abs/2109.10162" rel="alternate"/><summary>We prove that for any integer $n\in\mathbb{N}$, $d\in\{1,\ldots,n\}$ and any
284 $\varepsilon,\delta\in(0,1)$, a bounded function $f:\{-1,1\}^n\to[-1,1]$ of
285 degree at most $d$ can be learned with probability at least $1-\delta$ and
286 $L_2$-error $\varepsilon$ using $\log(\tfrac{n}{\delta})\,\varepsilon^{-d-1}
287 C^{d^{3/2}\sqrt{\log d}}$ random queries for a universal finite constant $C>1$.
288 </summary></entry><entry><id>http://arxiv.org/abs/2109.09988</id><title>Signal Classification using Smooth Coefficients of Multiple wavelets</title><updated>2021-09-23T09:06:49.530981+00:00</updated><author><name>Paul Grant</name></author><author><name>Md Zahidul Islam</name></author><link href="http://arxiv.org/abs/2109.09988" rel="alternate"/><summary>Classification of time series signals has become an important construct and
289 has many practical applications. With existing classifiers we may be able to
290 accurately classify signals, however that accuracy may decline if using a
291 reduced number of attributes. Transforming the data then undertaking reduction
292 in dimensionality may improve the quality of the data analysis, decrease time
293 required for classification and simplify models. We propose an approach, which
294 chooses suitable wavelets to transform the data, then combines the output from
295 these transforms to construct a dataset to then apply ensemble classifiers to.
296 We demonstrate this on different data sets, across different classifiers and
297 use differing evaluation methods. Our experimental results demonstrate the
298 effectiveness of the proposed technique, compared to the approaches that use
299 either raw signal data or a single wavelet transform.
300 </summary></entry><entry><id>http://arxiv.org/abs/2109.09859</id><title>Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective</title><updated>2021-09-23T09:06:49.530601+00:00</updated><author><name>Kabir Aladin Chandrasekher</name></author><author><name>Ashwin Pananjady</name></author><author><name>Christos Thrampoulidis</name></author><link href="http://arxiv.org/abs/2109.09859" rel="alternate"/><summary>We consider a general class of regression models with normally distributed
301 covariates, and the associated nonconvex problem of fitting these models from
302 data. We develop a general recipe for analyzing the convergence of iterative
303 algorithms for this task from a random initialization. In particular, provided
304 each iteration can be written as the solution to a convex optimization problem
305 satisfying some natural conditions, we leverage Gaussian comparison theorems to
306 derive a deterministic sequence that provides sharp upper and lower bounds on
307 the error of the algorithm with sample-splitting. Crucially, this deterministic
308 sequence accurately captures both the convergence rate of the algorithm and the
309 eventual error floor in the finite-sample regime, and is distinct from the
310 commonly used "population" sequence that results from taking the
311 infinite-sample limit. We apply our general framework to derive several
312 concrete consequences for parameter estimation in popular statistical models
313 including phase retrieval and mixtures of regressions. Provided the sample size
314 scales near-linearly in the dimension, we show sharp global convergence rates
315 for both higher-order algorithms based on alternating updates and first-order
316 algorithms based on subgradient descent. These corollaries, in turn, yield
317 multiple consequences, including: (a) Proof that higher-order algorithms can
318 converge significantly faster than their first-order counterparts (and
319 sometimes super-linearly), even if the two share the same population update and
320 (b) Intricacies in super-linear convergence behavior for higher-order
321 algorithms, which can be nonstandard (e.g., with exponent 3/2) and sensitive to
322 the noise level in the problem. We complement these results with extensive
323 numerical experiments, which show excellent agreement with our theoretical
324 predictions.
325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09856</id><title>SFFDD: Deep Neural Network with Enriched Features for Failure Prediction with Its Application to Computer Disk Driver</title><updated>2021-09-23T09:06:49.530264+00:00</updated><author><name>Lanfa Frank Wang</name></author><author><name>Danjue Li</name></author><link href="http://arxiv.org/abs/2109.09856" rel="alternate"/><summary>A classification technique incorporating a novel feature derivation method is
326 proposed for predicting failure of a system or device with multivariate time
327 series sensor data. We treat the multivariate time series sensor data as images
328 for both visualization and computation. Failure follows various patterns which
329 are closely related to the root causes. Different predefined transformations
330 are applied on the original sensors data to better characterize the failure
331 patterns. In addition to feature derivation, ensemble method is used to further
332 improve the performance. In addition, a general algorithm architecture of deep
333 neural network is proposed to handle multiple types of data with less manual
334 feature engineering. We apply the proposed method on the early predict failure
335 of computer disk drive in order to improve storage systems availability and
336 avoid data loss. The classification accuracy is largely improved with the
337 enriched features, named smart features.
338 </summary></entry><entry><id>http://arxiv.org/abs/2109.09855</id><title>Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits</title><updated>2021-09-23T09:06:49.529889+00:00</updated><author><name>Guojun Xiong</name></author><author><name>Jian Li</name></author><author><name>Rahul Singh</name></author><link href="http://arxiv.org/abs/2109.09855" rel="alternate"/><summary>We study a finite-horizon restless multi-armed bandit problem with multiple
339 actions, dubbed R(MA)^2B. The state of each arm evolves according to a
340 controlled Markov decision process (MDP), and the reward of pulling an arm
341 depends on both the current state of the corresponding MDP and the action
342 taken. The goal is to sequentially choose actions for arms so as to maximize
343 the expected value of the cumulative rewards collected. Since finding the
344 optimal policy is typically intractable, we propose a computationally appealing
345 index policy which we call Occupancy-Measured-Reward Index Policy. Our policy
346 is well-defined even if the underlying MDPs are not indexable. We prove that it
347 is asymptotically optimal when the activation budget and number of arms are
348 scaled up, while keeping their ratio as a constant. For the case when the
349 system parameters are unknown, we develop a learning algorithm. Our learning
350 algorithm uses the principle of optimism in the face of uncertainty and further
351 uses a generative model in order to fully exploit the structure of
352 Occupancy-Measured-Reward Index Policy. We call it the R(MA)^2B-UCB algorithm.
353 As compared with the existing algorithms, R(MA)^2B-UCB performs close to an
354 offline optimum policy, and also achieves a sub-linear regret with a low
355 computational complexity. Experimental results show that R(MA)^2B-UCB
356 outperforms the existing algorithms in both regret and run time.
357 </summary></entry><entry><id>http://arxiv.org/abs/2109.09847</id><title>Fast TreeSHAP: Accelerating SHAP Value Computation for Trees</title><updated>2021-09-23T09:06:49.529569+00:00</updated><author><name>Jilei Yang</name></author><link href="http://arxiv.org/abs/2109.09847" rel="alternate"/><summary>SHAP (SHapley Additive exPlanation) values are one of the leading tools for
358 interpreting machine learning models, with strong theoretical guarantees
359 (consistency, local accuracy) and a wide availability of implementations and
360 use cases. Even though computing SHAP values takes exponential time in general,
361 TreeSHAP takes polynomial time on tree-based models. While the speedup is
362 significant, TreeSHAP can still dominate the computation time of industry-level
363 machine learning solutions on datasets with millions or more entries, causing
364 delays in post-hoc model diagnosis and interpretation service. In this paper we
365 present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve the
366 computational efficiency of TreeSHAP for large datasets. We empirically find
367 that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory
368 cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at
369 the cost of a slightly higher memory usage, thanks to the pre-computation of
370 expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-suited for
371 multi-time model interpretations, resulting in as high as 3x faster explanation
372 of newly incoming samples.
373 </summary></entry><entry><id>http://arxiv.org/abs/2109.09831</id><title>SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization</title><updated>2021-09-23T09:06:49.529020+00:00</updated><author><name>Marius Lindauer</name></author><author><name>Katharina Eggensperger</name></author><author><name>Matthias Feurer</name></author><author><name>André Biedenkapp</name></author><author><name>Difan Deng</name></author><author><name>Carolin Benjamins</name></author><author><name>René Sass</name></author><author><name>Frank Hutter</name></author><link href="http://arxiv.org/abs/2109.09831" rel="alternate"/><summary>Algorithm parameters, in particular hyperparameters of machine learning
374 algorithms, can substantially impact their performance. To support users in
375 determining well-performing hyperparameter configurations for their algorithms,
376 datasets and applications at hand, SMAC3 offers a robust and flexible framework
377 for Bayesian Optimization, which can improve performance within a few
378 evaluations. It offers several facades and pre-sets for typical use cases, such
379 as optimizing hyperparameters, solving low dimensional continuous (artificial)
380 global optimization problems and configuring algorithms to perform well across
381 multiple problem instances. The SMAC3 package is available under a permissive
382 BSD-license at https://github.com/automl/SMAC3.
383 </summary></entry><entry><id>http://arxiv.org/abs/2109.09816</id><title>Deviation-Based Learning</title><updated>2021-09-23T09:06:49.528686+00:00</updated><author><name>Junpei Komiyama</name></author><author><name>Shunya Noda</name></author><link href="http://arxiv.org/abs/2109.09816" rel="alternate"/><summary>We propose deviation-based learning, a new approach to training recommender
384 systems. In the beginning, the recommender and rational users have different
385 pieces of knowledge, and the recommender needs to learn the users' knowledge to
386 make better recommendations. The recommender learns users' knowledge by
387 observing whether each user followed or deviated from her recommendations. We
388 show that learning frequently stalls if the recommender always recommends a
389 choice: users tend to follow the recommendation blindly, and their choices do
390 not reflect their knowledge. Social welfare and the learning rate are improved
391 drastically if the recommender abstains from recommending a choice when she
392 predicts that multiple arms will produce a similar payoff.
393 </summary></entry><entry><id>http://arxiv.org/abs/2011.02602</id><title>Merchant Category Identification Using Credit Card Transactions</title><updated>2021-09-23T09:06:49.528234+00:00</updated><author><name>Chin-Chia Michael Yeh</name></author><author><name>Zhongfang Zhuang</name></author><author><name>Yan Zheng</name></author><author><name>Liang Wang</name></author><author><name>Junpeng Wang</name></author><author><name>Wei Zhang</name></author><link href="http://arxiv.org/abs/2011.02602" rel="alternate"/><summary>Digital payment volume has proliferated in recent years with the rapid growth
394 of small businesses and online shops. When processing these digital
395 transactions, recognizing each merchant's real identity (i.e., business type)
396 is vital to ensure the integrity of payment processing systems. Conventionally,
397 this problem is formulated as a time series classification problem solely using
398 the merchant transaction history. However, with the large scale of the data,
399 and changing behaviors of merchants and consumers over time, it is extremely
400 challenging to achieve satisfying performance from off-the-shelf classification
401 methods. In this work, we approach this problem from a multi-modal learning
402 perspective, where we use not only the merchant time series data but also the
403 information of merchant-merchant relationship (i.e., affinity) to verify the
404 self-reported business type (i.e., merchant category) of a given merchant.
405 Specifically, we design two individual encoders, where one is responsible for
406 encoding temporal information and the other is responsible for affinity
407 information, and a mechanism to fuse the outputs of the two encoders to
408 accomplish the identification task. Our experiments on real-world credit card
409 transaction data between 71,668 merchants and 433,772,755 customers have
410 demonstrated the effectiveness and efficiency of the proposed model.
411 </summary></entry><entry><id>http://arxiv.org/abs/2007.05303</id><title>Multi-future Merchant Transaction Prediction</title><updated>2021-09-23T09:06:49.527829+00:00</updated><author><name>Chin-Chia Michael Yeh</name></author><author><name>Zhongfang Zhuang</name></author><author><name>Wei Zhang</name></author><author><name>Liang Wang</name></author><link href="http://arxiv.org/abs/2007.05303" rel="alternate"/><summary>The multivariate time series generated from merchant transaction history can
412 provide critical insights for payment processing companies. The capability of
413 predicting merchants' future is crucial for fraud detection and recommendation
414 systems. Conventionally, this problem is formulated to predict one multivariate
415 time series under the multi-horizon setting. However, real-world applications
416 often require more than one future trend prediction considering the
417 uncertainties, where more than one multivariate time series needs to be
418 predicted. This problem is called multi-future prediction. In this work, we
419 combine the two research directions and propose to study this new problem:
420 multi-future, multi-horizon and multivariate time series prediction. This
421 problem is crucial as it has broad use cases in the financial industry to
422 reduce the risk while improving user experience by providing alternative
423 futures. This problem is also challenging as now we not only need to capture
424 the patterns and insights from the past but also train a model that has a
425 strong inference capability to project multiple possible outcomes. To solve
426 this problem, we propose a new model using convolutional neural networks and a
427 simple yet effective encoder-decoder structure to learn the time series pattern
428 from multiple perspectives. We use experiments on real-world merchant
429 transaction data to demonstrate the effectiveness of our proposed model. We
430 also provide extensive discussions on different model design choices in our
431 experimental section.
432 </summary></entry><entry><id>http://arxiv.org/abs/2109.09690</id><title>Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes (update)</title><updated>2021-09-23T09:06:49.527407+00:00</updated><author><name>Jongseok Lee</name></author><author><name>Jianxiang Feng</name></author><author><name>Matthias Humt</name></author><author><name>Marcus G. Müller</name></author><author><name>Rudolph Triebel</name></author><link href="http://arxiv.org/abs/2109.09690" rel="alternate"/><summary>This paper presents a probabilistic framework to obtain both reliable and
433 fast uncertainty estimates for predictions with Deep Neural Networks (DNNs).
434 Our main contribution is a practical and principled combination of DNNs with
435 sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen
436 as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we
437 devise a learning algorithm that brings the derived theory into practice. In
438 experiments from two different robotic tasks -- inverse dynamics of a
439 manipulator and object detection on a micro-aerial vehicle (MAV) -- we show the
440 effectiveness of our approach in terms of predictive uncertainty, improved
441 scalability, and run-time efficiency on a Jetson TX2. We thus argue that our
442 approach can pave the way towards reliable and fast robot learning systems with
443 uncertainty awareness.
444 </summary></entry><entry><id>http://arxiv.org/abs/2109.09658</id><title>FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Future Medical Imaging (update)</title><updated>2021-09-23T09:06:49.526638+00:00</updated><author><name>Karim Lekadir</name></author><author><name>Richard Osuala</name></author><author><name>Catherine Gallin</name></author><author><name>Noussair Lazrak</name></author><author><name>Kaisar Kushibar</name></author><author><name>Gianna Tsakou</name></author><author><name>Susanna Aussó</name></author><author><name>Leonor Cerdá Alberich</name></author><author><name>Konstantinos Marias</name></author><author><name>Manolis Tskinakis</name></author><author><name>Sara Colantonio</name></author><author><name>Nickolas Papanikolaou</name></author><author><name>Zohaib Salahuddin</name></author><author><name>Henry C Woodruff</name></author><author><name>Philippe Lambin</name></author><author><name>Luis Martí-Bonmatí</name></author><link href="http://arxiv.org/abs/2109.09658" rel="alternate"/><summary>The recent advancements in artificial intelligence (AI) combined with the
445 extensive amount of data generated by today's clinical systems, has led to the
446 development of imaging AI solutions across the whole value chain of medical
447 imaging, including image reconstruction, medical image segmentation,
448 image-based diagnosis and treatment planning. Notwithstanding the successes and
449 future potential of AI in medical imaging, many stakeholders are concerned of
450 the potential risks and ethical implications of imaging AI solutions, which are
451 perceived as complex, opaque, and difficult to comprehend, utilise, and trust
452 in critical clinical applications. Despite these concerns and risks, there are
453 currently no concrete guidelines and best practices for guiding future AI
454 developments in medical imaging towards increased trust, safety and adoption.
455 To bridge this gap, this paper introduces a careful selection of guiding
456 principles drawn from the accumulated experiences, consensus, and best
457 practices from five large European projects on AI in Health Imaging. These
458 guiding principles are named FUTURE-AI and its building blocks consist of (i)
459 Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Robustness
460 and (vi) Explainability. In a step-by-step approach, these guidelines are
461 further translated into a framework of concrete recommendations for specifying,
462 developing, evaluating, and deploying technically, clinically and ethically
463 trustworthy AI solutions into clinical practice.
464 </summary></entry><entry><id>http://arxiv.org/abs/2109.09105</id><title>What BERT Based Language Models Learn in Spoken Transcripts: An Empirical Study (update)</title><updated>2021-09-23T09:06:49.526265+00:00</updated><author><name>Ayush Kumar</name></author><author><name>Mukuntha Narayanan Sundararaman</name></author><author><name>Jithendra Vepa</name></author><link href="http://arxiv.org/abs/2109.09105" rel="alternate"/><summary>Language Models (LMs) have been ubiquitously leveraged in various tasks
465 including spoken language understanding (SLU). Spoken language requires careful
466 understanding of speaker interactions, dialog states and speech induced
467 multimodal behaviors to generate a meaningful representation of the
468 conversation. In this work, we propose to dissect SLU into three representative
469 properties:conversational (disfluency, pause, overtalk), channel (speaker-type,
470 turn-tasks) and ASR (insertion, deletion,substitution). We probe BERT based
471 language models (BERT, RoBERTa) trained on spoken transcripts to investigate
472 its ability to understand multifarious properties in absence of any speech
473 cues. Empirical results indicate that LM is surprisingly good at capturing
474 conversational properties such as pause prediction and overtalk detection from
475 lexical tokens. On the downsides, the LM scores low on turn-tasks and ASR
476 errors predictions. Additionally, pre-training the LM on spoken transcripts
477 restrain its linguistic understanding. Finally, we establish the efficacy and
478 transferability of the mentioned properties on two benchmark datasets:
479 Switchboard Dialog Act and Disfluency datasets.
480 </summary></entry><entry><id>http://arxiv.org/abs/2109.07436</id><title>Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes (update)</title><updated>2021-09-23T09:06:49.525891+00:00</updated><author><name>Sriram Gopalakrishnan</name></author><author><name>Mudit Verma</name></author><author><name>Subbarao Kambhampati</name></author><link href="http://arxiv.org/abs/2109.07436" rel="alternate"/><summary>When humans are given a policy to execute, there can be policy execution
481 errors and deviations in execution if there is uncertainty in identifying a
482 state. So an algorithm that computes a policy for a human to execute ought to
483 consider these effects in its computations. An optimal MDP policy that is
484 poorly executed (because of a human agent) maybe much worse than another policy
485 that is executed with fewer errors. In this paper, we consider the problems of
486 erroneous execution and execution delay when computing policies for a human
487 agent that would act in a setting modeled by a Markov Decision Process. We
488 present a framework to model the likelihood of policy execution errors and
489 likelihood of non-policy actions like inaction (delays) due to state
490 uncertainty. This is followed by a hill climbing algorithm to search for good
491 policies that account for these errors. We then use the best policy found by
492 hill climbing with a branch and bound algorithm to find the optimal policy. We
493 show experimental results in a Gridworld domain and analyze the performance of
494 the two algorithms. We also present human studies that verify if our
495 assumptions on policy execution by humans under state-aliasing are reasonable.
496 </summary></entry><entry><id>http://arxiv.org/abs/2109.01134</id><title>Learning to Prompt for Vision-Language Models (update)</title><updated>2021-09-23T09:06:49.525484+00:00</updated><author><name>Kaiyang Zhou</name></author><author><name>Jingkang Yang</name></author><author><name>Chen Change Loy</name></author><author><name>Ziwei Liu</name></author><link href="http://arxiv.org/abs/2109.01134" rel="alternate"/><summary>Vision-language pre-training has recently emerged as a promising alternative
497 for representation learning. It shifts from the tradition of using images and
498 discrete labels for learning a fixed set of weights, seen as visual concepts,
499 to aligning images and raw text for two separate encoders. Such a paradigm
500 benefits from a broader source of supervision and allows zero-shot transfer to
501 downstream tasks since visual concepts can be diametrically generated from
502 natural language, known as prompt. In this paper, we identify that a major
503 challenge of deploying such models in practice is prompt engineering. This is
504 because designing a proper prompt, especially for context words surrounding a
505 class name, requires domain expertise and typically takes a significant amount
506 of time for words tuning since a slight change in wording could have a huge
507 impact on performance. Moreover, different downstream tasks require specific
508 designs, further hampering the efficiency of deployment. To overcome this
509 challenge, we propose a novel approach named context optimization (CoOp). The
510 main idea is to model context in prompts using continuous representations and
511 perform end-to-end learning from data while keeping the pre-trained parameters
512 fixed. In this way, the design of task-relevant prompts can be fully automated.
513 Experiments on 11 datasets show that CoOp effectively turns pre-trained
514 vision-language models into data-efficient visual learners, requiring as few as
515 one or two shots to beat hand-crafted prompts with a decent margin and able to
516 gain significant improvements when using more shots (e.g., at 16 shots the
517 average gain is around 17% with the highest reaching over 50%). CoOp also
518 exhibits strong robustness to distribution shift.
519 </summary></entry><entry><id>http://arxiv.org/abs/2108.09432</id><title>ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators (update)</title><updated>2021-09-23T09:06:49.525027+00:00</updated><author><name>Qixing Huang</name></author><author><name>Xiangru Huang</name></author><author><name>Bo Sun</name></author><author><name>Zaiwei Zhang</name></author><author><name>Junfeng Jiang</name></author><author><name>Chandrajit Bajaj</name></author><link href="http://arxiv.org/abs/2108.09432" rel="alternate"/><summary>This paper introduces an unsupervised loss for training parametric
520 deformation shape generators. The key idea is to enforce the preservation of
521 local rigidity among the generated shapes. Our approach builds on an
522 approximation of the as-rigid-as possible (or ARAP) deformation energy. We show
523 how to develop the unsupervised loss via a spectral decomposition of the
524 Hessian of the ARAP energy. Our loss nicely decouples pose and shape variations
525 through a robust norm. The loss admits simple closed-form expressions. It is
526 easy to train and can be plugged into any standard generation models, e.g.,
527 variational auto-encoder (VAE) and auto-decoder (AD). Experimental results show
528 that our approach outperforms existing shape generation approaches considerably
529 on public benchmark datasets of various shape categories such as human, animal
530 and bone.
531 </summary></entry><entry><id>http://arxiv.org/abs/2107.11913</id><title>Measuring Ethics in AI with AI: A Methodology and Dataset Construction (update)</title><updated>2021-09-23T09:06:49.524619+00:00</updated><author><name>Pedro H.C. Avelar</name></author><author><name>Rafael B. Audibert</name></author><author><name>Anderson R. Tavares</name></author><author><name>Luís C. Lamb</name></author><link href="http://arxiv.org/abs/2107.11913" rel="alternate"/><summary>Recently, the use of sound measures and metrics in Artificial Intelligence
532 has become the subject of interest of academia, government, and industry.
533 Efforts towards measuring different phenomena have gained traction in the AI
534 community, as illustrated by the publication of several influential field
535 reports and policy documents. These metrics are designed to help decision
536 takers to inform themselves about the fast-moving and impacting influences of
537 key advances in Artificial Intelligence in general and Machine Learning in
538 particular. In this paper we propose to use such newfound capabilities of AI
539 technologies to augment our AI measuring capabilities. We do so by training a
540 model to classify publications related to ethical issues and concerns. In our
541 methodology we use an expert, manually curated dataset as the training set and
542 then evaluate a large set of research papers. Finally, we highlight the
543 implications of AI metrics, in particular their contribution towards developing
544 trustful and fair AI-based tools and technologies. Keywords: AI Ethics; AI
545 Fairness; AI Measurement. Ethics in Computer Science.
546 </summary></entry><entry><id>http://arxiv.org/abs/2107.04775</id><title>LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative Tasks (update)</title><updated>2021-09-23T09:06:49.524190+00:00</updated><author><name>Albert Wilcox</name></author><author><name>Ashwin Balakrishna</name></author><author><name>Brijen Thananjeyan</name></author><author><name>Joseph E. Gonzalez</name></author><author><name>Ken Goldberg</name></author><link href="http://arxiv.org/abs/2107.04775" rel="alternate"/><summary>Reinforcement learning (RL) has shown impressive success in exploring
547 high-dimensional environments to learn complex tasks, but can often exhibit
548 unsafe behaviors and require extensive environment interaction when exploration
549 is unconstrained. A promising strategy for learning in dynamically uncertain
550 environments is requiring that the agent can robustly return to learned safe
551 sets, where task success (and therefore safety) can be guaranteed. While this
552 approach has been successful in low-dimensions, enforcing this constraint in
553 environments with visual observations is exceedingly challenging. We present a
554 novel continuous representation for safe sets by framing it as a binary
555 classification problem in a learned latent space, which flexibly scales to
556 image observations. We then present a new algorithm, Latent Space Safe Sets
557 (LS3), which uses this representation for long-horizon tasks with sparse
558 rewards. We evaluate LS3 on 4 domains, including a challenging sequential
559 pushing task in simulation and a physical cable routing task. We find that LS3
560 can use prior task successes to restrict exploration and learn more efficiently
561 than prior algorithms while satisfying constraints. See
562 https://tinyurl.com/latent-ss for code and supplementary material.
563 </summary></entry><entry><id>http://arxiv.org/abs/2106.07857</id><title>Bilateral Personalized Dialogue Generation with Contrastive Learning (update)</title><updated>2021-09-23T09:06:49.523794+00:00</updated><author><name>Bin Li</name></author><author><name>Hanjun Deng</name></author><link href="http://arxiv.org/abs/2106.07857" rel="alternate"/><summary>Generating personalized responses is one of the major challenges in natural
564 human-robot interaction. Current researches in this field mainly focus on
565 generating responses consistent with the robot's pre-assigned persona, while
566 ignoring the user's persona. Such responses may be inappropriate or even
567 offensive, which may lead to the bad user experience. Therefore, we propose a
568 Bilateral Personalized Dialogue Generation (BPDG) method for dyadic
569 conversation, which integrates user and robot personas into dialogue generation
570 via designing a dynamic persona-aware fusion method. To bridge the gap between
571 the learning objective function and evaluation metrics, the Conditional Mutual
572 Information Maximum (CMIM) criterion is adopted with contrastive learning to
573 select the proper response from the generated candidates. Moreover, a bilateral
574 persona accuracy metric is designed to measure the degree of bilateral
575 personalization. Experimental results demonstrate that, compared with several
576 state-of-the-art methods, the final results of the proposed method are more
577 personalized and consistent with bilateral personas in terms of both automatic
578 and manual evaluations.
579 </summary></entry><entry><id>http://arxiv.org/abs/2105.15033</id><title>DiaKG: an Annotated Diabetes Dataset for Medical Knowledge Graph Construction (update)</title><updated>2021-09-23T09:06:49.523206+00:00</updated><author><name>Dejie Chang</name></author><author><name>Mosha Chen</name></author><author><name>Chaozhen Liu</name></author><author><name>Liping Liu</name></author><author><name>Dongdong Li</name></author><author><name>Wei Li</name></author><author><name>Fei Kong</name></author><author><name>Bangchang Liu</name></author><author><name>Xiaobin Luo</name></author><author><name>Ji Qi</name></author><author><name>Qiao Jin</name></author><author><name>Bin Xu</name></author><link href="http://arxiv.org/abs/2105.15033" rel="alternate"/><summary>Knowledge Graph has been proven effective in modeling structured information
580 and conceptual knowledge, especially in the medical domain. However, the lack
581 of high-quality annotated corpora remains a crucial problem for advancing the
582 research and applications on this task. In order to accelerate the research for
583 domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a
584 high-quality Chinese dataset for Diabetes knowledge graph, which contains
585 22,050 entities and 6,890 relations in total. We implement recent typical
586 methods for Named Entity Recognition and Relation Extraction as a benchmark to
587 evaluate the proposed dataset thoroughly. Empirical results show that the DiaKG
588 is challenging for most existing methods and further analysis is conducted to
589 discuss future research direction for improvements. We hope the release of this
590 dataset can assist the construction of diabetes knowledge graphs and facilitate
591 AI-based applications.
592 </summary></entry><entry><id>http://arxiv.org/abs/2105.11844</id><title>CI-dataset and DetDSCI methodology for detecting too small and too large critical infrastructures in satellite images: Airports and electrical substations as case study (update)</title><updated>2021-09-23T09:06:49.522772+00:00</updated><author><name>Francisco Pérez-Hernández</name></author><author><name>José Rodríguez-Ortega</name></author><author><name>Yassir Benhammou</name></author><author><name>Francisco Herrera</name></author><author><name>Siham Tabik</name></author><link href="http://arxiv.org/abs/2105.11844" rel="alternate"/><summary>The detection of critical infrastructures in large territories represented by
593 aerial and satellite images is of high importance in several fields such as in
594 security, anomaly detection, land use planning and land use change detection.
595 However, the detection of such infrastructures is complex as they have highly
596 variable shapes and sizes, i.e., some infrastructures, such as electrical
597 substations, are too small while others, such as airports, are too large.
598 Besides, airports can have a surface area either small or too large with
599 completely different shapes, which makes its correct detection challenging. As
600 far as we know, these limitations have not been tackled yet in previous works.
601 This paper presents (1) a smart Critical Infrastructure dataset, named
602 CI-dataset, organised into two scales, small and large scales critical
603 infrastructures and (2) a two-level resolution-independent critical
604 infrastructure detection (DetDSCI) methodology that first determines the
605 spatial resolution of the input image using a classification model, then
606 analyses the image using the appropriate detector for that spatial resolution.
607 The present study targets two representative classes, airports and electrical
608 substations. Our experiments show that DetDSCI methodology achieves up to
609 37,53% F1 improvement with respect to Faster R-CNN, one of the most influential
610 detection models.
611 </summary></entry><entry><id>http://arxiv.org/abs/2103.13460</id><title>Under Pressure: Learning to Detect Slip with Barometric Tactile Sensors (update)</title><updated>2021-09-23T09:06:49.522356+00:00</updated><author><name>Abhinav Grover</name></author><author><name>Christopher Grebe</name></author><author><name>Philippe Nadeau</name></author><author><name>Jonathan Kelly</name></author><link href="http://arxiv.org/abs/2103.13460" rel="alternate"/><summary>Despite the utility of tactile information, tactile sensors have yet to be
612 widely deployed in industrial robotics settings. Part of the challenge lies in
613 identifying slip and other key events from the tactile data stream. In this
614 paper, we present a learning-based method to detect slip using barometric
615 tactile sensors. Although these sensors have a low resolution, they have many
616 other desirable properties including high reliability and durability, a very
617 slim profile, and a low cost. We are able to achieve slip detection accuracies
618 of greater than 91% while being robust to the speed and direction of the slip
619 motion. Further, we test our detector on two robot manipulation tasks involving
620 common household objects and demonstrate successful generalization to
621 real-world scenarios not seen during training. We show that barometric tactile
622 sensing technology, combined with data-driven learning, is potentially suitable
623 for complex manipulation tasks such as slip compensation.
624 </summary></entry><entry><id>http://arxiv.org/abs/2102.08633</id><title>Open-Retrieval Conversational Machine Reading (update)</title><updated>2021-09-23T09:06:49.521944+00:00</updated><author><name>Yifan Gao</name></author><author><name>Jingjing Li</name></author><author><name>Michael R. Lyu</name></author><author><name>Irwin King</name></author><link href="http://arxiv.org/abs/2102.08633" rel="alternate"/><summary>In conversational machine reading, systems need to interpret natural language
625 rules, answer high-level questions such as "May I qualify for VA health care
626 benefits?", and ask follow-up clarification questions whose answer is necessary
627 to answer the original question. However, existing works assume the rule text
628 is provided for each user question, which neglects the essential retrieval step
629 in real scenarios. In this work, we propose and investigate an open-retrieval
630 setting of conversational machine reading. In the open-retrieval setting, the
631 relevant rule texts are unknown so that a system needs to retrieve
632 question-relevant evidence from a collection of rule texts, and answer users'
633 high-level questions according to multiple retrieved rule texts in a
634 conversational manner. We propose MUDERN, a Multi-passage Discourse-aware
635 Entailment Reasoning Network which extracts conditions in the rule texts
636 through discourse segmentation, conducts multi-passage entailment reasoning to
637 answer user questions directly, or asks clarification follow-up questions to
638 inquiry more information. On our created OR-ShARC dataset, MUDERN achieves the
639 state-of-the-art performance, outperforming existing single-passage
640 conversational machine reading models as well as a new multi-passage
641 conversational machine reading baseline by a large margin. In addition, we
642 conduct in-depth analyses to provide new insights into this new setting and our
643 model.
644 </summary></entry><entry><id>http://arxiv.org/abs/2102.07358</id><title>Weak Adaptation Learning -- Addressing Cross-domain Data Insufficiency with Weak Annotator (update)</title><updated>2021-09-23T09:06:49.521525+00:00</updated><author><name>Shichao Xu</name></author><author><name>Lixu Wang</name></author><author><name>Yixuan Wang</name></author><author><name>Qi Zhu</name></author><link href="http://arxiv.org/abs/2102.07358" rel="alternate"/><summary>Data quantity and quality are crucial factors for data-driven learning
645 methods. In some target problem domains, there are not many data samples
646 available, which could significantly hinder the learning process. While data
647 from similar domains may be leveraged to help through domain adaptation,
648 obtaining high-quality labeled data for those source domains themselves could
649 be difficult or costly. To address such challenges on data insufficiency for
650 classification problem in a target domain, we propose a weak adaptation
651 learning (WAL) approach that leverages unlabeled data from a similar source
652 domain, a low-cost weak annotator that produces labels based on task-specific
653 heuristics, labeling rules, or other methods (albeit with inaccuracy), and a
654 small amount of labeled data in the target domain. Our approach first conducts
655 a theoretical analysis on the error bound of the trained classifier with
656 respect to the data quantity and the performance of the weak annotator, and
657 then introduces a multi-stage weak adaptation learning method to learn an
658 accurate classifier by lowering the error bound. Our experiments demonstrate
659 the effectiveness of our approach in learning an accurate classifier with
660 limited labeled data in the target domain and unlabeled data in the source
661 domain.
662 </summary></entry><entry><id>http://arxiv.org/abs/2102.04394</id><title>Learning with Density Matrices and Random Features (update)</title><updated>2021-09-23T09:06:49.521043+00:00</updated><author><name>Fabio A. González</name></author><author><name>Alejandro Gallego</name></author><author><name>Santiago Toledo-Cortés</name></author><author><name>Vladimir Vargas-Calderón</name></author><link href="http://arxiv.org/abs/2102.04394" rel="alternate"/><summary>A density matrix describes the statistical state of a quantum system. It is a
663 powerful formalism to represent both the quantum and classical uncertainty of
664 quantum systems and to express different statistical operations such as
665 measurement, system combination and expectations as linear algebra operations.
666 This paper explores how density matrices can be used as a building block to
667 build machine learning models exploiting their ability to straightforwardly
668 combine linear algebra and probability. One of the main results of the paper is
669 to show that density matrices coupled with random Fourier features could
670 approximate arbitrary probability distributions over $\mathbb{R}^n$. Based on
671 this finding the paper builds different models for density estimation,
672 classification and regression. These models are differentiable, so it is
673 possible to integrate them with other differentiable components, such as deep
674 learning architectures and to learn their parameters using gradient-based
675 optimization. In addition, the paper presents optimization-less training
676 strategies based on estimation and model averaging. The models are evaluated in
677 benchmark tasks and the results are reported and discussed.
678 </summary></entry><entry><id>http://arxiv.org/abs/2011.11152</id><title>Understanding and Scheduling Weight Decay (update)</title><updated>2021-09-23T09:06:49.520655+00:00</updated><author><name>Zeke Xie</name></author><author><name>Issei Sato</name></author><author><name>Masashi Sugiyama</name></author><link href="http://arxiv.org/abs/2011.11152" rel="alternate"/><summary>Weight decay is a popular and even necessary regularization technique for
679 training deep neural networks that generalize well. Previous work usually
680 interpreted weight decay as a Gaussian prior from the Bayesian perspective.
681 However, weight decay sometimes shows mysterious behaviors beyond the
682 conventional understanding. For example, the optimal weight decay value tends
683 to be zero given long enough training time. Moreover, existing work typically
684 failed to recognize the importance of scheduling weight decay during training.
685 Our work aims at theoretically understanding novel behaviors of weight decay
686 and designing schedulers for weight decay in deep learning. This paper mainly
687 has three contributions. First, we propose a novel theoretical interpretation
688 of weight decay from the perspective of learning dynamics. Second, we propose a
689 novel weight-decay linear scaling rule for large-batch training that
690 proportionally increases weight decay rather than the learning rate as the
691 batch size increases. Third, we provide an effective learning-rate-aware
692 scheduler for weight decay, called the Stable Weight Decay (SWD) method, which,
693 to the best of our knowledge, is the first practical design for weight decay
694 scheduling. In our various experiments, the SWD method often makes improvements
695 over $L_{2}$ Regularization and Decoupled Weight Decay.
696 </summary></entry><entry><id>http://arxiv.org/abs/2011.02073</id><title>MBB: Model-Based Baseline for Efficient Reinforcement Learning (update)</title><updated>2021-09-23T09:06:49.520212+00:00</updated><author><name>Xubo Lyu</name></author><author><name>Site Li</name></author><author><name>Seth Siriya</name></author><author><name>Ye Pu</name></author><author><name>Mo Chen</name></author><link href="http://arxiv.org/abs/2011.02073" rel="alternate"/><summary>Model-free reinforcement learning (RL) is capable of learning control
697 policies for high-dimensional, complex robotic tasks, but tends to be
698 data-inefficient. Model-based RL tends to be more data-efficient but often
699 suffers from learning a high-dimensional model that is good enough for policy
700 improvement. This limits its use to learning simple models for restrictive
701 domains. Optimal control generates solutions without collecting any data,
702 assuming an accurate model of the system and environment is known, which is
703 often true in many control theory applications. However, optimal control cannot
704 be scaled to problems with a high-dimensional state space. In this paper, we
705 propose a novel approach to alleviate data inefficiency of model-free RL in
706 high-dimensional problems by warm-starting the learning process using a
707 lower-dimensional model-based solution. Particularly, we initialize a baseline
708 function for the high-dimensional RL problem via supervision from a
709 lower-dimensional value function, which can be obtained by solving a
710 lower-dimensional problem with a known, approximate model using "classical"
711 techniques such as value iteration or optimal control. Therefore, our approach
712 implicitly exploits the model priors from simplified problem space to
713 facilitate the policy learning in high-dimensional RL tasks. We demonstrate our
714 approach on two representative robotic learning tasks and observe significant
715 improvement in policy performance and learning efficiency. We also evaluate our
716 method empirically with a third task.
717 </summary></entry><entry><id>http://arxiv.org/abs/2004.12908</id><title>Omnidirectional Transfer for Quasilinear Lifelong Learning (update)</title><updated>2021-09-23T09:06:49.519512+00:00</updated><author><name>Joshua T. Vogelstein</name></author><author><name>Jayanta Dey</name></author><author><name>Hayden S. Helm</name></author><author><name>Will LeVine</name></author><author><name>Ronak D. Mehta</name></author><author><name>Ali Geisa</name></author><author><name>Haoyin Xu</name></author><author><name>Gido M. van de Ven</name></author><author><name>Emily Chang</name></author><author><name>Chenyu Gao</name></author><author><name>Weiwei Yang</name></author><author><name>Bryan Tower</name></author><author><name>Jonathan Larson</name></author><author><name>Christopher M. White</name></author><author><name>Carey E. Priebe</name></author><link href="http://arxiv.org/abs/2004.12908" rel="alternate"/><summary>In biological learning, data are used to improve performance not only on the
718 current task, but also on previously encountered and as yet unencountered
719 tasks. In contrast, classical machine learning starts from a blank slate, or
720 tabula rasa, using data only for the single task at hand. While typical
721 transfer learning algorithms can improve performance on future tasks, their
722 performance on prior tasks degrades upon learning new tasks (called
723 catastrophic forgetting). Many recent approaches for continual or lifelong
724 learning have attempted to maintain performance given new tasks. But striving
725 to avoid forgetting sets the goal unnecessarily low: the goal of lifelong
726 learning, whether biological or artificial, should be to improve performance on
727 all tasks (including past and future) with any new data. We propose
728 omnidirectional transfer learning algorithms, which includes two special cases
729 of interest: decision forests and deep networks. Our key insight is the
730 development of the omni-voter layer, which ensembles representations learned
731 independently on all tasks to jointly decide how to proceed on any given new
732 data point, thereby improving performance on both past and future tasks. Our
733 algorithms demonstrate omnidirectional transfer in a variety of simulated and
734 real data scenarios, including tabular data, image data, spoken data, and
735 adversarial tasks. Moreover, they do so with quasilinear space and time
736 complexity.
737 </summary></entry><entry><id>http://arxiv.org/abs/2109.10322</id><title>CondNet: Conditional Classifier for Scene Segmentation</title><updated>2021-09-23T09:06:49.519051+00:00</updated><author><name>Changqian Yu</name></author><author><name>Yuanjie Shao</name></author><author><name>Changxin Gao</name></author><author><name>Nong Sang</name></author><link href="http://arxiv.org/abs/2109.10322" rel="alternate"/><summary>The fully convolutional network (FCN) has achieved tremendous success in
738 dense visual recognition tasks, such as scene segmentation. The last layer of
739 FCN is typically a global classifier (1x1 convolution) to recognize each pixel
740 to a semantic label. We empirically show that this global classifier, ignoring
741 the intra-class distinction, may lead to sub-optimal results.
742 </summary></entry><entry><id>http://arxiv.org/abs/2109.10317</id><title>Introduction to Neural Network Verification</title><updated>2021-09-23T09:06:49.518738+00:00</updated><author><name>Aws Albarghouthi</name></author><link href="http://arxiv.org/abs/2109.10317" rel="alternate"/><summary>Deep learning has transformed the way we think of software and what it can
743 do. But deep neural networks are fragile and their behaviors are often
744 surprising. In many settings, we need to provide formal guarantees on the
745 safety, security, correctness, or robustness of neural networks. This book
746 covers foundational ideas from formal verification and their adaptation to
747 reasoning about neural networks and deep learning.
748 </summary></entry><entry><id>http://arxiv.org/abs/2109.10312</id><title>Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks</title><updated>2021-09-23T09:06:49.518267+00:00</updated><author><name>Bohan Wu</name></author><author><name>Suraj Nair</name></author><author><name>Li Fei-Fei</name></author><author><name>Chelsea Finn</name></author><link href="http://arxiv.org/abs/2109.10312" rel="alternate"/><summary>In this paper, we study the problem of learning a repertoire of low-level
749 skills from raw images that can be sequenced to complete long-horizon
750 visuomotor tasks. Reinforcement learning (RL) is a promising approach for
751 acquiring short-horizon skills autonomously. However, the focus of RL
752 algorithms has largely been on the success of those individual skills, more so
753 than learning and grounding a large repertoire of skills that can be sequenced
754 to complete extended multi-stage tasks. The latter demands robustness and
755 persistence, as errors in skills can compound over time, and may require the
756 robot to have a number of primitive skills in its repertoire, rather than just
757 one. To this end, we introduce EMBR, a model-based RL method for learning
758 primitive skills that are suitable for completing long-horizon visuomotor
759 tasks. EMBR learns and plans using a learned model, critic, and success
760 classifier, where the success classifier serves both as a reward function for
761 RL and as a grounding mechanism to continuously detect if the robot should
762 retry a skill when unsuccessful or under perturbations. Further, the learned
763 model is task-agnostic and trained using data from all skills, enabling the
764 robot to efficiently learn a number of distinct primitives. These visuomotor
765 primitive skills and their associated pre- and post-conditions can then be
766 directly combined with off-the-shelf symbolic planners to complete long-horizon
767 tasks. On a Franka Emika robot arm, we find that EMBR enables the robot to
768 complete three long-horizon visuomotor tasks at 85% success rate, such as
769 organizing an office desk, a file cabinet, and drawers, which require
770 sequencing up to 12 skills, involve 14 unique learned primitives, and demand
771 generalization to novel objects.
772 </summary></entry><entry><id>http://arxiv.org/abs/2109.10303</id><title>Computing Complexity-aware Plans Using Kolmogorov Complexity</title><updated>2021-09-23T09:06:49.517919+00:00</updated><author><name>Elis Stefansson</name></author><author><name>Karl H. Johansson</name></author><link href="http://arxiv.org/abs/2109.10303" rel="alternate"/><summary>In this paper, we introduce complexity-aware planning for finite-horizon
773 deterministic finite automata with rewards as outputs, based on Kolmogorov
774 complexity. Kolmogorov complexity is considered since it can detect
775 computational regularities of deterministic optimal policies. We present a
776 planning objective yielding an explicit trade-off between a policy's
777 performance and complexity. It is proven that maximising this objective is
778 non-trivial in the sense that dynamic programming is infeasible. We present two
779 algorithms obtaining low-complexity policies, where the first algorithm obtains
780 a low-complexity optimal policy, and the second algorithm finds a policy
781 maximising performance while maintaining local (stage-wise) complexity
782 constraints. We evaluate the algorithms on a simple navigation task for a
783 mobile robot, where our algorithms yield low-complexity policies that concur
784 with intuition.
785 </summary></entry><entry><id>http://arxiv.org/abs/2109.10285</id><title>Early and Revocable Time Series Classification</title><updated>2021-09-23T09:06:49.517510+00:00</updated><author><name>Youssef Achenchabe</name></author><author><name>Alexis Bondu</name></author><author><name>Antoine Cornuéjols</name></author><author><name>Vincent Lemaire</name></author><link href="http://arxiv.org/abs/2109.10285" rel="alternate"/><summary>Many approaches have been proposed for early classification of time series in
786 light of itssignificance in a wide range of applications including healthcare,
787 transportation and fi-nance. Until now, the early classification problem has
788 been dealt with by considering onlyirrevocable decisions. This paper introduces
789 a new problem calledearly and revocabletimeseries classification, where the
790 decision maker can revoke its earlier decisions based on thenew available
791 measurements. In order to formalize and tackle this problem, we propose anew
792 cost-based framework and derive two new approaches from it. The first approach
793 doesnot consider explicitly the cost of changing decision, while the second one
794 does. Exten-sive experiments are conducted to evaluate these approaches on a
795 large benchmark of realdatasets. The empirical results obtained convincingly
796 show (i) that the ability of revok-ing decisions significantly improves
797 performance over the irrevocable regime, and (ii) thattaking into account the
798 cost of changing decision brings even better results in
799 general.Keywords:revocable decisions, cost estimation, online decision making
800 </summary></entry><entry><id>http://arxiv.org/abs/2109.10246</id><title>Does Vision-and-Language Pretraining Improve Lexical Grounding?</title><updated>2021-09-23T09:06:49.517131+00:00</updated><author><name>Tian Yun</name></author><author><name>Chen Sun</name></author><author><name>Ellie Pavlick</name></author><link href="http://arxiv.org/abs/2109.10246" rel="alternate"/><summary>Linguistic representations derived from text alone have been criticized for
801 their lack of grounding, i.e., connecting words to their meanings in the
802 physical world. Vision-and-Language (VL) models, trained jointly on text and
803 image or video data, have been offered as a response to such criticisms.
804 However, while VL pretraining has shown success on multimodal tasks such as
805 visual question answering, it is not yet known how the internal linguistic
806 representations themselves compare to their text-only counterparts. This paper
807 compares the semantic representations learned via VL vs. text-only pretraining
808 for two recent VL models using a suite of analyses (clustering, probing, and
809 performance on a commonsense question answering task) in a language-only
810 setting. We find that the multimodal models fail to significantly outperform
811 the text-only variants, suggesting that future work is required if multimodal
812 pretraining is to be pursued as a means of improving NLP in general.
813 </summary></entry><entry><id>http://arxiv.org/abs/2109.10231</id><title>SalienTrack: providing salient information for semi-automated self-tracking feedback with model explanations</title><updated>2021-09-23T09:06:49.516665+00:00</updated><author><name>Yunlong Wang</name></author><author><name>Jiaying Liu</name></author><author><name>Homin Park</name></author><author><name>Jordan Schultz-McArdle</name></author><author><name>Stephanie Rosenthal</name></author><author><name>Brian Y Lim</name></author><link href="http://arxiv.org/abs/2109.10231" rel="alternate"/><summary>Self-tracking can improve people's awareness of their unhealthy behaviors to
814 provide insights towards behavior change. Prior work has explored how
815 self-trackers reflect on their logged data, but it remains unclear how much
816 they learn from the tracking feedback, and which information is more useful.
817 Indeed, the feedback can still be overwhelming, and making it concise can
818 improve learning by increasing focus and reducing interpretation burden. We
819 conducted a field study of mobile food logging with two feedback modes (manual
820 journaling and automatic annotation of food images) and identified learning
821 differences regarding nutrition, assessment, behavioral, and contextual
822 information. We propose a Self-Tracking Feedback Saliency Framework to define
823 when to provide feedback, on which specific information, why those details, and
824 how to present them (as manual inquiry or automatic feedback). We propose
825 SalienTrack to implement these requirements. Using the data collected from the
826 user study, we trained a machine learning model to predict whether a user would
827 learn from each tracked event. Using explainable AI (XAI) techniques, we
828 identified the most salient features per instance and why they lead to positive
829 learning outcomes. We discuss implications for learnability in self-tracking,
830 and how adding model explainability expands opportunities for improving
831 feedback experience.
832 </summary></entry><entry><id>http://arxiv.org/abs/2109.10217</id><title>Shape Inference and Grammar Induction for Example-based Procedural Generation</title><updated>2021-09-23T09:06:49.516292+00:00</updated><author><name>Gillis Hermans</name></author><author><name>Thomas Winters</name></author><author><name>Luc De Raedt</name></author><link href="http://arxiv.org/abs/2109.10217" rel="alternate"/><summary>Designers increasingly rely on procedural generation for automatic generation
833 of content in various industries. These techniques require extensive knowledge
834 of the desired content, and about how to actually implement such procedural
835 methods. Algorithms for learning interpretable generative models from example
836 content could alleviate both difficulties. We propose SIGI, a novel method for
837 inferring shapes and inducing a shape grammar from grid-based 3D building
838 examples. This interpretable grammar is well-suited for co-creative design.
839 Applied to Minecraft buildings, we show how the shape grammar can be used to
840 automatically generate new buildings in a similar style.
841 </summary></entry><entry><id>http://arxiv.org/abs/2109.10200</id><title>Off-line approximate dynamic programming for the vehicle routing problem with stochastic customers and demands via decentralized decision-making</title><updated>2021-09-23T09:06:49.515928+00:00</updated><author><name>Mohsen Dastpak</name></author><author><name>Fausto Errico</name></author><link href="http://arxiv.org/abs/2109.10200" rel="alternate"/><summary>This paper studies a stochastic variant of the vehicle routing problem (VRP)
842 where both customer locations and demands are uncertain. In particular,
843 potential customers are not restricted to a predefined customer set but are
844 continuously spatially distributed in a given service area. The objective is to
845 maximize the served demands while fulfilling vehicle capacities and time
846 restrictions. We call this problem the VRP with stochastic customers and
847 demands (VRPSCD). For this problem, we first propose a Markov Decision Process
848 (MDP) formulation representing the classical centralized decision-making
849 perspective where one decision-maker establishes the routes of all vehicles.
850 While the resulting formulation turns out to be intractable, it provides us
851 with the ground to develop a new MDP formulation of the VRPSCD representing a
852 decentralized decision-making framework, where vehicles autonomously establish
853 their own routes. This new formulation allows us to develop several strategies
854 to reduce the dimension of the state and action spaces, resulting in a
855 considerably more tractable problem. We solve the decentralized problem via
856 Reinforcement Learning, and in particular, we develop a Q-learning algorithm
857 featuring state-of-the-art acceleration techniques such as Replay Memory and
858 Double Q Network. Computational results show that our method considerably
859 outperforms two commonly adopted benchmark policies (random and heuristic).
860 Moreover, when comparing with existing literature, we show that our approach
861 can compete with specialized methods developed for the particular case of the
862 VRPSCD where customer locations and expected demands are known in advance.
863 Finally, we show that the value functions and policies obtained by our
864 algorithm can be easily embedded in Rollout algorithms, thus further improving
865 their performances.
866 </summary></entry><entry><id>http://arxiv.org/abs/2109.10199</id><title>Design and implementation of a parsimonious neuromorphic PID for onboard altitude control for MAVs using neuromorphic processors</title><updated>2021-09-23T09:06:49.515541+00:00</updated><author><name>Stein Stroobants</name></author><author><name>Julien Dupeyroux</name></author><author><name>Guido de Croon</name></author><link href="http://arxiv.org/abs/2109.10199" rel="alternate"/><summary>The great promises of neuromorphic sensing and processing for robotics have
867 led researchers and engineers to investigate novel models for robust and
868 reliable control of autonomous robots (navigation, obstacle detection and
869 avoidance, etc.), especially for quadrotors in challenging contexts such as
870 drone racing and aggressive maneuvers. Using spiking neural networks, these
871 models can be run on neuromorphic hardware to benefit from outstanding update
872 rates and high energy efficiency. Yet, low-level controllers are often
873 neglected and remain outside of the neuromorphic loop. Designing low-level
874 neuromorphic controllers is crucial to remove the standard PID, and therefore
875 benefit from all the advantages of closing the neuromorphic loop. In this
876 paper, we propose a parsimonious and adjustable neuromorphic PID controller,
877 endowed with a minimal number of 93 neurons sparsely connected to achieve
878 autonomous, onboard altitude control of a quadrotor equipped with Intel's Loihi
879 neuromorphic chip. We successfully demonstrate the robustness of our proposed
880 network in a set of experiments where the quadrotor is requested to reach a
881 target altitude from take-off. Our results confirm the suitability of such
882 low-level neuromorphic controllers, ultimately with a very high update
883 frequency.
884 </summary></entry><entry><id>http://arxiv.org/abs/2109.10187</id><title>Oriented Object Detection in Aerial Images Based on Area Ratio of Parallelogram</title><updated>2021-09-23T09:06:49.515064+00:00</updated><author><name>Xinyu Yu</name></author><author><name>Mi Lin</name></author><author><name>Jiangping Lu</name></author><author><name>Linlin Ou</name></author><link href="http://arxiv.org/abs/2109.10187" rel="alternate"/><summary>Rotated object detection is a challenging task in aerial images as the object
885 in aerial images are displayed in arbitrary directions and usually densely
886 packed. Although considerable progress has been made, there are still
887 challenges that existing regression-based rotation detectors suffer the problem
888 of discontinuous boundaries, which is directly caused by angular periodicity or
889 corner ordering. In this paper, we propose a simple effective framework to
890 address the above challenges. Instead of directly regressing the five
891 parameters (coordinates of the central point, width, height, and rotation
892 angle) or the four vertices, we use the area ratio of parallelogram (ARP) to
893 accurately describe a multi-oriented object. Specifically, we regress
894 coordinates of center point, height and width of minimum circumscribed
895 rectangle of oriented object and three area ratios {\lambda}_1, {\lambda}_2 and
896 {\lambda}_3. This may facilitate the offset learning and avoid the issue of
897 angular periodicity or label points sequence for oriented objects. To further
898 remedy the confusion issue nearly horizontal objects, we employ the area ratio
899 between the object and its horizontal bounding box (minimum circumscribed
900 rectangle) to guide the selection of horizontal or oriented detection for each
901 object. We also propose a rotation efficient IoU loss (R-EIoU) to connect the
902 horizontal bounding box with the three area ratios and improve the accurate for
903 the rotating bounding box. Experimental results on three remote sensing
904 datasets including HRSC2016, DOTA and UCAS-AOD and scene text including
905 ICDAR2015 show that our method achieves superior detection performance compared
906 with many state-of-the-art approaches. The code and model will be coming with
907 paper published.
908 </summary></entry><entry><id>http://arxiv.org/abs/2109.10173</id><title>Long-Term Exploration in Persistent MDPs</title><updated>2021-09-23T09:06:49.514674+00:00</updated><author><name>Leonid Ugadiarov</name></author><author><name>Alexey Skrynnik</name></author><author><name>Aleksandr I. Panov</name></author><link href="http://arxiv.org/abs/2109.10173" rel="alternate"/><summary>Exploration is an essential part of reinforcement learning, which restricts
909 the quality of learned policy. Hard-exploration environments are defined by
910 huge state space and sparse rewards. In such conditions, an exhaustive
911 exploration of the environment is often impossible, and the successful training
912 of an agent requires a lot of interaction steps. In this paper, we propose an
913 exploration method called Rollback-Explore (RbExplore), which utilizes the
914 concept of the persistent Markov decision process, in which agents during
915 training can roll back to visited states. We test our algorithm in the
916 hard-exploration Prince of Persia game, without rewards and domain knowledge.
917 At all used levels of the game, our agent outperforms or shows comparable
918 results with state-of-the-art curiosity methods with knowledge-based intrinsic
919 motivation: ICM and RND. An implementation of RbExplore can be found at
920 https://github.com/cds-mipt/RbExplore.
921 </summary></entry><entry><id>http://arxiv.org/abs/2109.10149</id><title>Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd Ideation</title><updated>2021-09-23T09:06:49.514210+00:00</updated><author><name>Yunlong Wang</name></author><author><name>Priyadarshini Venkatesh</name></author><author><name>Brian Y. Lim</name></author><link href="http://arxiv.org/abs/2109.10149" rel="alternate"/><summary>Feedback can help crowdworkers to improve their ideations. However, current
922 feedback methods require human assessment from facilitators or peers. This is
923 not scalable to large crowds. We propose Interpretable Directed Diversity to
924 automatically predict ideation quality and diversity scores, and provide AI
925 explanations - Attribution, Contrastive Attribution, and Counterfactual
926 Suggestions - for deeper feedback on why ideations were scored (low), and how
927 to get higher scores. These explanations provide multi-faceted feedback as
928 users iteratively improve their ideation. We conducted think aloud and
929 controlled user studies to understand how various explanations are used, and
930 evaluated whether explanations improve ideation diversity and quality. Users
931 appreciated that explanation feedback helped focus their efforts and provided
932 directions for improvement. This resulted in explanations improving diversity
933 compared to no feedback or feedback with predictions only. Hence, our approach
934 opens opportunities for explainable AI towards scalable and rich feedback for
935 iterative crowd ideation.
936 </summary></entry><entry><id>http://arxiv.org/abs/2109.10129</id><title>Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits</title><updated>2021-09-23T09:06:49.513806+00:00</updated><author><name>Simon Ståhlberg</name></author><author><name>Blai Bonet</name></author><author><name>Hector Geffner</name></author><link href="http://arxiv.org/abs/2109.10129" rel="alternate"/><summary>It has been recently shown that general policies for many classical planning
937 domains can be expressed and learned in terms of a pool of features defined
938 from the domain predicates using a description logic grammar. At the same time,
939 most description logics correspond to a fragment of $k$-variable counting logic
940 ($C_k$) for $k=2$, that has been shown to provide a tight characterization of
941 the expressive power of graph neural networks. In this work, we make use of
942 these results to understand the power and limits of using graph neural networks
943 (GNNs) for learning optimal general policies over a number of tractable
944 planning domains where such policies are known to exist. For this, we train a
945 simple GNN in a supervised manner to approximate the optimal value function
946 $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory, it is
947 observed that general optimal policies are obtained in domains where general
948 optimal value functions can be defined with $C_2$ features but not in those
949 requiring more expressive $C_3$ features. In addition, it is observed that the
950 features learned are in close correspondence with the features needed to
951 express $V^{*}$ in closed form. The theory and the analysis of the domains let
952 us understand the features that are actually learned as well as those that
953 cannot be learned in this way, and let us move in a principled manner from a
954 combinatorial optimization approach to learning general policies to a
955 potentially, more robust and scalable approach based on deep learning.
956 </summary></entry><entry><id>http://arxiv.org/abs/2109.10106</id><title>Distributed Mission Planning of Complex Tasks for Heterogeneous Multi-Robot Teams</title><updated>2021-09-23T09:06:49.513430+00:00</updated><author><name>Barbara Arbanas Ferreira</name></author><author><name>Tamara Petrović</name></author><author><name>Stjepan Bogdan</name></author><link href="http://arxiv.org/abs/2109.10106" rel="alternate"/><summary>In this paper, we propose a distributed multi-stage optimization method for
957 planning complex missions for heterogeneous multi-robot teams. This class of
958 problems involves tasks that can be executed in different ways and are
959 associated with cross-schedule dependencies that constrain the schedules of the
960 different robots in the system. The proposed approach involves a
961 multi-objective heuristic search of the mission, represented as a hierarchical
962 tree that defines the mission goal. This procedure outputs several favorable
963 ways to fulfill the mission, which directly feed into the next stage of the
964 method. We propose a distributed metaheuristic based on evolutionary
965 computation to allocate tasks and generate schedules for the set of chosen
966 decompositions. The method is evaluated in a simulation setup of an automated
967 greenhouse use case, where we demonstrate the method's ability to adapt the
968 planning strategy depending on the available robots and the given optimization
969 criteria.
970 </summary></entry><entry><id>http://arxiv.org/abs/2109.10100</id><title>A Novel Structured Natural Gradient Descent for Deep Learning</title><updated>2021-09-23T09:06:49.513082+00:00</updated><author><name>Weihua Liu</name></author><author><name>Xiabi Liu</name></author><link href="http://arxiv.org/abs/2109.10100" rel="alternate"/><summary>Natural gradient descent (NGD) provided deep insights and powerful tools to
971 deep neural networks. However the computation of Fisher information matrix
972 becomes more and more difficult as the network structure turns large and
973 complex. This paper proposes a new optimization method whose main idea is to
974 accurately replace the natural gradient optimization by reconstructing the
975 network. More specifically, we reconstruct the structure of the deep neural
976 network, and optimize the new network using traditional gradient descent (GD).
977 The reconstructed network achieves the effect of the optimization way with
978 natural gradient descent. Experimental results show that our optimization
979 method can accelerate the convergence of deep network models and achieve better
980 performance than GD while sharing its computational simplicity.
981 </summary></entry><entry><id>http://arxiv.org/abs/2109.10086</id><title>SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval</title><updated>2021-09-23T09:06:49.512667+00:00</updated><author><name>Thibault Formal</name></author><author><name>Carlos Lassance</name></author><author><name>Benjamin Piwowarski</name></author><author><name>Stéphane Clinchant</name></author><link href="http://arxiv.org/abs/2109.10086" rel="alternate"/><summary>In neural Information Retrieval (IR), ongoing research is directed towards
982 improving the first retriever in ranking pipelines. Learning dense embeddings
983 to conduct retrieval using efficient approximate nearest neighbors methods has
984 proven to work well. Meanwhile, there has been a growing interest in learning
985 \emph{sparse} representations for documents and queries, that could inherit
986 from the desirable properties of bag-of-words models such as the exact matching
987 of terms and the efficiency of inverted indexes. Introduced recently, the
988 SPLADE model provides highly sparse representations and competitive results
989 with respect to state-of-the-art dense and sparse approaches. In this paper, we
990 build on SPLADE and propose several significant improvements in terms of
991 effectiveness and/or efficiency. More specifically, we modify the pooling
992 mechanism, benchmark a model solely based on document expansion, and introduce
993 models trained with distillation. We also report results on the BEIR benchmark.
994 Overall, SPLADE is considerably improved with more than $9$\% gains on NDCG@10
995 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
996 </summary></entry><entry><id>http://arxiv.org/abs/2109.10085</id><title>Heterogeneous Ensemble for ESG Ratings Prediction</title><updated>2021-09-23T09:06:49.512201+00:00</updated><author><name>Tim Krappel</name></author><author><name>Alex Bogun</name></author><author><name>Damian Borth</name></author><link href="http://arxiv.org/abs/2109.10085" rel="alternate"/><summary>Over the past years, topics ranging from climate change to human rights have
997 seen increasing importance for investment decisions. Hence, investors (asset
998 managers and asset owners) who wanted to incorporate these issues started to
999 assess companies based on how they handle such topics. For this assessment,
1000 investors rely on specialized rating agencies that issue ratings along the
1001 environmental, social and governance (ESG) dimensions. Such ratings allow them
1002 to make investment decisions in favor of sustainability. However, rating
1003 agencies base their analysis on subjective assessment of sustainability
1004 reports, not provided by every company. Furthermore, due to human labor
1005 involved, rating agencies are currently facing the challenge to scale up the
1006 coverage in a timely manner.
1007 </summary></entry><entry><id>http://arxiv.org/abs/2109.10065</id><title>Comparison of Neural Network based Soft Computing Techniques for Electromagnetic Modeling of a Microstrip Patch Antenna</title><updated>2021-09-23T09:06:49.511839+00:00</updated><author><name>Yuvraj Singh Malhi</name></author><author><name>Navneet Gupta</name></author><link href="http://arxiv.org/abs/2109.10065" rel="alternate"/><summary>This paper presents the comparison of various neural networks and algorithms
1008 based on accuracy, quickness, and consistency for antenna modelling. Using
1009 Nntool by MATLAB, 22 different combinations of networks and training algorithms
1010 are used to predict the dimensions of a rectangular microstrip antenna using
1011 dielectric constant, height of substrate, and frequency of operation as input.
1012 Comparison and characterization of networks is done based on accuracy, mean
1013 square error, and training time. Algorithms, on the other hand, are analyzed by
1014 their accuracy, speed, reliability, and smoothness in the training process.
1015 Finally, these results are analyzed, and recommendations are made for each
1016 neural network and algorithm based on uses, advantages, and disadvantages. For
1017 example, it is observed that Reduced Radial Bias network is the most accurate
1018 network and Scaled Conjugate Gradient is the most reliable algorithm for
1019 electromagnetic modelling. This paper will help a researcher find the optimum
1020 network and algorithm directly without doing time-taking experimentation.
1021 </summary></entry><entry><id>http://arxiv.org/abs/2109.10057</id><title>LOTR: Face Landmark Localization Using Localization Transformer</title><updated>2021-09-23T09:06:49.511324+00:00</updated><author><name>Ukrit Watchareeruetai</name></author><author><name>Benjaphan Sommanna</name></author><author><name>Sanjana Jain</name></author><author><name>Pavit Noinongyao</name></author><author><name>Ankush Ganguly</name></author><author><name>Aubin Samacoits</name></author><author><name>Samuel W.F. Earp</name></author><author><name>Nakarin Sritrakool</name></author><link href="http://arxiv.org/abs/2109.10057" rel="alternate"/><summary>This paper presents a novel Transformer-based facial landmark localization
1022 network named Localization Transformer (LOTR). The proposed framework is a
1023 direct coordinate regression approach leveraging a Transformer network to
1024 better utilize the spatial information in the feature map. An LOTR model
1025 consists of three main modules: 1) a visual backbone that converts an input
1026 image into a feature map, 2) a Transformer module that improves the feature
1027 representation from the visual backbone, and 3) a landmark prediction head that
1028 directly predicts the landmark coordinates from the Transformer's
1029 representation. Given cropped-and-aligned face images, the proposed LOTR can be
1030 trained end-to-end without requiring any post-processing steps. This paper also
1031 introduces the smooth-Wing loss function, which addresses the gradient
1032 discontinuity of the Wing loss, leading to better convergence than standard
1033 loss functions such as L1, L2, and Wing loss. Experimental results on the JD
1034 landmark dataset provided by the First Grand Challenge of 106-Point Facial
1035 Landmark Localization indicate the superiority of LOTR over the existing
1036 methods on the leaderboard and two recent heatmap-based approaches.
1037 </summary></entry><entry><id>http://arxiv.org/abs/2109.10047</id><title>Search For Deep Graph Neural Networks</title><updated>2021-09-23T09:06:49.510946+00:00</updated><author><name>Guosheng Feng</name></author><author><name>Chunnan Wang</name></author><author><name>Hongzhi Wang</name></author><link href="http://arxiv.org/abs/2109.10047" rel="alternate"/><summary>Current GNN-oriented NAS methods focus on the search for different layer
1038 aggregate components with shallow and simple architectures, which are limited
1039 by the 'over-smooth' problem. To further explore the benefits from structural
1040 diversity and depth of GNN architectures, we propose a GNN generation pipeline
1041 with a novel two-stage search space, which aims at automatically generating
1042 high-performance while transferable deep GNN models in a block-wise manner.
1043 Meanwhile, to alleviate the 'over-smooth' problem, we incorporate multiple
1044 flexible residual connection in our search space and apply identity mapping in
1045 the basic GNN layers. For the search algorithm, we use deep-q-learning with
1046 epsilon-greedy exploration strategy and reward reshaping. Extensive experiments
1047 on real-world datasets show that our generated GNN models outperforms existing
1048 manually designed and NAS-based ones.
1049 </summary></entry><entry><id>http://arxiv.org/abs/2109.10034</id><title>Learning offline: memory replay in biological and artificial reinforcement learning</title><updated>2021-09-23T09:06:49.510518+00:00</updated><author><name>Emma L. Roscow</name></author><author><name>Raymond Chua</name></author><author><name>Rui Ponte Costa</name></author><author><name>Matt W. Jones</name></author><author><name>Nathan Lepora</name></author><link href="http://arxiv.org/abs/2109.10034" rel="alternate"/><summary>Learning to act in an environment to maximise rewards is among the brain's
1050 key functions. This process has often been conceptualised within the framework
1051 of reinforcement learning, which has also gained prominence in machine learning
1052 and artificial intelligence (AI) as a way to optimise decision-making. A common
1053 aspect of both biological and machine reinforcement learning is the
1054 reactivation of previously experienced episodes, referred to as replay. Replay
1055 is important for memory consolidation in biological neural networks, and is key
1056 to stabilising learning in deep neural networks. Here, we review recent
1057 developments concerning the functional roles of replay in the fields of
1058 neuroscience and AI. Complementary progress suggests how replay might support
1059 learning processes, including generalisation and continual learning, affording
1060 opportunities to transfer knowledge across the two fields to advance the
1061 understanding of biological and artificial learning and memory.
1062 </summary></entry><entry><id>http://arxiv.org/abs/2109.10020</id><title>Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks</title><updated>2021-09-23T09:06:49.510005+00:00</updated><author><name>Chin-Chia Michael Yeh</name></author><author><name>Zhongfang Zhuang</name></author><author><name>Junpeng Wang</name></author><author><name>Yan Zheng</name></author><author><name>Javid Ebrahimi</name></author><author><name>Ryan Mercer</name></author><author><name>Liang Wang</name></author><author><name>Wei Zhang</name></author><link href="http://arxiv.org/abs/2109.10020" rel="alternate"/><summary>Predicting metrics associated with entities' transnational behavior within
1063 payment processing networks is essential for system monitoring. Multivariate
1064 time series, aggregated from the past transaction history, can provide valuable
1065 insights for such prediction. The general multivariate time series prediction
1066 problem has been well studied and applied across several domains, including
1067 manufacturing, medical, and entomology. However, new domain-related challenges
1068 associated with the data such as concept drift and multi-modality have surfaced
1069 in addition to the real-time requirements of handling the payment transaction
1070 data at scale. In this work, we study the problem of multivariate time series
1071 prediction for estimating transaction metrics associated with entities in the
1072 payment transaction database. We propose a model with five unique components to
1073 estimate the transaction metrics from multi-modality data. Four of these
1074 components capture interaction, temporal, scale, and shape perspectives, and
1075 the fifth component fuses these perspectives together. We also propose a hybrid
1076 offline/online training scheme to address concept drift in the data and fulfill
1077 the real-time requirements. Combining the estimation model with a graphical
1078 user interface, the prototype transaction metric estimation system has
1079 demonstrated its potential benefit as a tool for improving a payment processing
1080 company's system monitoring capability.
1081 </summary></entry><entry><id>http://arxiv.org/abs/2109.10016</id><title>CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval</title><updated>2021-09-23T09:06:49.509605+00:00</updated><author><name>Zhijian Hou</name></author><author><name>Chong-Wah Ngo</name></author><author><name>Wing Kwong Chan</name></author><link href="http://arxiv.org/abs/2109.10016" rel="alternate"/><summary>This paper tackles a recently proposed Video Corpus Moment Retrieval task.
1082 This task is essential because advanced video retrieval applications should
1083 enable users to retrieve a precise moment from a large video corpus. We propose
1084 a novel CONtextual QUery-awarE Ranking~(CONQUER) model for effective moment
1085 localization and ranking. CONQUER explores query context for multi-modal fusion
1086 and representation learning in two different steps. The first step derives
1087 fusion weights for the adaptive combination of multi-modal video content. The
1088 second step performs bi-directional attention to tightly couple video and query
1089 as a single joint representation for moment localization. As query context is
1090 fully engaged in video representation learning, from feature fusion to
1091 transformation, the resulting feature is user-centered and has a larger
1092 capacity in capturing multi-modal signals specific to query. We conduct studies
1093 on two datasets, TVR for closed-world TV episodes and DiDeMo for open-world
1094 user-generated videos, to investigate the potential advantages of fusing video
1095 and query online as a joint representation for moment retrieval.
1096 </summary></entry><entry><id>http://arxiv.org/abs/2109.10011</id><title>Unsupervised Abstract Reasoning for Raven's Problem Matrices</title><updated>2021-09-23T09:06:49.509153+00:00</updated><author><name>Tao Zhuo</name></author><author><name>Qiang Huang</name></author><author><name>Mohan Kankanhalli</name></author><link href="http://arxiv.org/abs/2109.10011" rel="alternate"/><summary>Raven's Progressive Matrices (RPM) is highly correlated with human
1097 intelligence, and it has been widely used to measure the abstract reasoning
1098 ability of humans. In this paper, to study the abstract reasoning capability of
1099 deep neural networks, we propose the first unsupervised learning method for
1100 solving RPM problems. Since the ground truth labels are not allowed, we design
1101 a pseudo target based on the prior constraints of the RPM formulation to
1102 approximate the ground truth label, which effectively converts the unsupervised
1103 learning strategy into a supervised one. However, the correct answer is wrongly
1104 labelled by the pseudo target, and thus the noisy contrast will lead to
1105 inaccurate model training. To alleviate this issue, we propose to improve the
1106 model performance with negative answers. Moreover, we develop a
1107 decentralization method to adapt the feature representation to different RPM
1108 problems. Extensive experiments on three datasets demonstrate that our method
1109 even outperforms some of the supervised approaches. Our code is available at
1110 https://github.com/visiontao/ncd.
1111 </summary></entry><entry><id>http://arxiv.org/abs/2109.10007</id><title>Generating Local Maps of Science using Deep Bibliographic Coupling</title><updated>2021-09-23T09:06:49.508792+00:00</updated><author><name>Gaëlle Candel</name></author><author><name>David Naccache</name></author><link href="http://arxiv.org/abs/2109.10007" rel="alternate"/><summary>Bibliographic and co-citation coupling are two analytical methods widely used
1112 to measure the degree of similarity between scientific papers. These approaches
1113 are intuitive, easy to put into practice, and computationally cheap. Moreover,
1114 they have been used to generate a map of science, allowing visualizing research
1115 field interactions. Nonetheless, these methods do not work unless two papers
1116 share a standard reference, limiting the two papers usability with no direct
1117 connection. In this work, we propose to extend bibliographic coupling to the
1118 deep neighborhood, by using graph diffusion methods. This method allows
1119 defining similarity between any two papers, making it possible to generate a
1120 local map of science, highlighting field organization.
1121 </summary></entry><entry><id>http://arxiv.org/abs/2109.09975</id><title>Fast nonlinear risk assessment for autonomous vehicles using learned conditional probabilistic models of agent futures</title><updated>2021-09-23T09:06:49.508363+00:00</updated><author><name>Ashkan Jasour</name></author><author><name>Xin Huang</name></author><author><name>Allen Wang</name></author><author><name>Brian C. William</name></author><link href="http://arxiv.org/abs/2109.09975" rel="alternate"/><summary>This paper presents fast non-sampling based methods to assess the risk for
1122 trajectories of autonomous vehicles when probabilistic predictions of other
1123 agents' futures are generated by deep neural networks (DNNs). The presented
1124 methods address a wide range of representations for uncertain predictions
1125 including both Gaussian and non-Gaussian mixture models to predict both agent
1126 positions and control inputs conditioned on the scene contexts. We show that
1127 the problem of risk assessment when Gaussian mixture models (GMMs) of agent
1128 positions are learned can be solved rapidly to arbitrary levels of accuracy
1129 with existing numerical methods. To address the problem of risk assessment for
1130 non-Gaussian mixture models of agent position, we propose finding upper bounds
1131 on risk using nonlinear Chebyshev's Inequality and sums-of-squares (SOS)
1132 programming; they are both of interest as the former is much faster while the
1133 latter can be arbitrarily tight. These approaches only require higher order
1134 statistical moments of agent positions to determine upper bounds on risk. To
1135 perform risk assessment when models are learned for agent control inputs as
1136 opposed to positions, we propagate the moments of uncertain control inputs
1137 through the nonlinear motion dynamics to obtain the exact moments of uncertain
1138 position over the planning horizon. To this end, we construct deterministic
1139 linear dynamical systems that govern the exact time evolution of the moments of
1140 uncertain position in the presence of uncertain control inputs. The presented
1141 methods are demonstrated on realistic predictions from DNNs trained on the
1142 Argoverse and CARLA datasets and are shown to be effective for rapidly
1143 assessing the probability of low probability events.
1144 </summary></entry><entry><id>http://arxiv.org/abs/2109.09968</id><title>Generalization in Text-based Games via Hierarchical Reinforcement Learning</title><updated>2021-09-23T09:06:49.507912+00:00</updated><author><name>Yunqiu Xu</name></author><author><name>Meng Fang</name></author><author><name>Ling Chen</name></author><author><name>Yali Du</name></author><author><name>Chengqi Zhang</name></author><link href="http://arxiv.org/abs/2109.09968" rel="alternate"/><summary>Deep reinforcement learning provides a promising approach for text-based
1145 games in studying natural language communication between humans and artificial
1146 agents. However, the generalization still remains a big challenge as the agents
1147 depend critically on the complexity and variety of training tasks. In this
1148 paper, we address this problem by introducing a hierarchical framework built
1149 upon the knowledge graph-based RL agent. In the high level, a meta-policy is
1150 executed to decompose the whole game into a set of subtasks specified by
1151 textual goals, and select one of them based on the KG. Then a sub-policy in the
1152 low level is executed to conduct goal-conditioned reinforcement learning. We
1153 carry out experiments on games with various difficulty levels and show that the
1154 proposed method enjoys favorable generalizability.
1155 </summary></entry><entry><id>http://arxiv.org/abs/2109.09960</id><title>Enforcing Mutual Consistency of Hard Regions for Semi-supervised Medical Image Segmentation</title><updated>2021-09-23T09:06:49.507378+00:00</updated><author><name>Yicheng Wu</name></author><author><name>Zongyuan Ge</name></author><author><name>Donghao Zhang</name></author><author><name>Minfeng Xu</name></author><author><name>Lei Zhang</name></author><author><name>Yong Xia</name></author><author><name>Jianfei Cai</name></author><link href="http://arxiv.org/abs/2109.09960" rel="alternate"/><summary>In this paper, we proposed a novel mutual consistency network (MC-Net+) to
1156 effectively exploit the unlabeled hard regions for semi-supervised medical
1157 image segmentation. The MC-Net+ model is motivated by the observation that deep
1158 models trained with limited annotations are prone to output highly uncertain
1159 and easily mis-classified predictions in the ambiguous regions (e.g. adhesive
1160 edges or thin branches) for the image segmentation task. Leveraging these
1161 region-level challenging samples can make the semi-supervised segmentation
1162 model training more effective. Therefore, our proposed MC-Net+ model consists
1163 of two new designs. First, the model contains one shared encoder and multiple
1164 sightly different decoders (i.e. using different up-sampling strategies). The
1165 statistical discrepancy of multiple decoders' outputs is computed to denote the
1166 model's uncertainty, which indicates the unlabeled hard regions. Second, a new
1167 mutual consistency constraint is enforced between one decoder's probability
1168 output and other decoders' soft pseudo labels. In this way, we minimize the
1169 model's uncertainty during training and force the model to generate invariant
1170 and low-entropy results in such challenging areas of unlabeled data, in order
1171 to learn a generalized feature representation. We compared the segmentation
1172 results of the MC-Net+ with five state-of-the-art semi-supervised approaches on
1173 three public medical datasets. Extension experiments with two common
1174 semi-supervised settings demonstrate the superior performance of our model over
1175 other existing methods, which sets a new state of the art for semi-supervised
1176 medical image segmentation.
1177 </summary></entry><entry><id>http://arxiv.org/abs/2109.09946</id><title>Identifying biases in legal data: An algorithmic fairness perspective</title><updated>2021-09-23T09:06:49.507009+00:00</updated><author><name>Jackson Sargent</name></author><author><name>Melanie Weber</name></author><link href="http://arxiv.org/abs/2109.09946" rel="alternate"/><summary>The need to address representation biases and sentencing disparities in legal
1178 case data has long been recognized. Here, we study the problem of identifying
1179 and measuring biases in large-scale legal case data from an algorithmic
1180 fairness perspective. Our approach utilizes two regression models: A baseline
1181 that represents the decisions of a "typical" judge as given by the data and a
1182 "fair" judge that applies one of three fairness concepts. Comparing the
1183 decisions of the "typical" judge and the "fair" judge allows for quantifying
1184 biases across demographic groups, as we demonstrate in four case studies on
1185 criminal data from Cook County (Illinois).
1186 </summary></entry><entry><id>http://arxiv.org/abs/2109.09906</id><title>Audio Interval Retrieval using Convolutional Neural Networks</title><updated>2021-09-23T09:06:49.506567+00:00</updated><author><name>Ievgeniia Kuzminykh</name></author><author><name>Dan Shevchuk</name></author><author><name>Stavros Shiaeles</name></author><author><name>Bogdan Ghita</name></author><link href="http://arxiv.org/abs/2109.09906" rel="alternate"/><summary>Modern streaming services are increasingly labeling videos based on their
1187 visual or audio content. This typically augments the use of technologies such
1188 as AI and ML by allowing to use natural speech for searching by keywords and
1189 video descriptions. Prior research has successfully provided a number of
1190 solutions for speech to text, in the case of a human speech, but this article
1191 aims to investigate possible solutions to retrieve sound events based on a
1192 natural language query, and estimate how effective and accurate they are. In
1193 this study, we specifically focus on the YamNet, AlexNet, and ResNet-50
1194 pre-trained models to automatically classify audio samples using their
1195 respective melspectrograms into a number of predefined classes. The predefined
1196 classes can represent sounds associated with actions within a video fragment.
1197 Two tests are conducted to evaluate the performance of the models on two
1198 separate problems: audio classification and intervals retrieval based on a
1199 natural language query. Results show that the benchmarked models are comparable
1200 in terms of performance, with YamNet slightly outperforming the other two
1201 models. YamNet was able to classify single fixed-size audio samples with 92.7%
1202 accuracy and 68.75% precision while its average accuracy on intervals retrieval
1203 was 71.62% and precision was 41.95%. The investigated method may be embedded
1204 into an automated event marking architecture for streaming services.
1205 </summary></entry><entry><id>http://arxiv.org/abs/2109.09904</id><title>Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems</title><updated>2021-09-23T09:06:49.506026+00:00</updated><author><name>Subbarao Kambhampati</name></author><author><name>Sarath Sreedharan</name></author><author><name>Mudit Verma</name></author><author><name>Yantian Zha</name></author><author><name>Lin Guan</name></author><link href="http://arxiv.org/abs/2109.09904" rel="alternate"/><summary>Despite the surprising power of many modern AI systems that often learn their
1206 own representations, there is significant discontent about their inscrutability
1207 and the attendant problems in their ability to interact with humans. While
1208 alternatives such as neuro-symbolic approaches have been proposed, there is a
1209 lack of consensus on what they are about. There are often two independent
1210 motivations (i) symbols as a lingua franca for human-AI interaction and (ii)
1211 symbols as (system-produced) abstractions use in its internal reasoning. The
1212 jury is still out on whether AI systems will need to use symbols in their
1213 internal reasoning to achieve general intelligence capabilities. Whatever the
1214 answer there is, the need for (human-understandable) symbols in human-AI
1215 interaction seems quite compelling. Symbols, like emotions, may well not be
1216 sine qua non for intelligence per se, but they will be crucial for AI systems
1217 to interact with us humans--as we can neither turn off our emotions nor get by
1218 without our symbols. In particular, in many human-designed domains, humans
1219 would be interested in providing explicit (symbolic) knowledge and advice--and
1220 expect machine explanations in kind. This alone requires AI systems to at least
1221 do their I/O in symbolic terms. In this blue sky paper, we argue this point of
1222 view, and discuss research directions that need to be pursued to allow for this
1223 type of human-AI interaction.
1224 </summary></entry><entry><id>http://arxiv.org/abs/2109.09889</id><title>A Simple Unified Framework for Anomaly Detection in Deep Reinforcement Learning</title><updated>2021-09-23T09:06:49.505560+00:00</updated><author><name>Hongming Zhang</name></author><author><name>Ke Sun</name></author><author><name>Bo Xu</name></author><author><name>Linglong Kong</name></author><author><name>Martin Müller</name></author><link href="http://arxiv.org/abs/2109.09889" rel="alternate"/><summary>Abnormal states in deep reinforcement learning~(RL) are states that are
1225 beyond the scope of an RL policy. Such states may make the RL system unsafe and
1226 impede its deployment in real scenarios. In this paper, we propose a simple yet
1227 effective anomaly detection framework for deep RL algorithms that
1228 simultaneously considers random, adversarial and out-of-distribution~(OOD)
1229 state outliers. In particular, we attain the class-conditional distributions
1230 for each action class under the Gaussian assumption, and rely on these
1231 distributions to discriminate between inliers and outliers based on Mahalanobis
1232 Distance~(MD) and Robust Mahalanobis Distance. We conduct extensive experiments
1233 on Atari games that verify the effectiveness of our detection strategies. To
1234 the best of our knowledge, we present the first in-detail study of statistical
1235 and adversarial anomaly detection in deep RL algorithms. This simple unified
1236 anomaly detection paves the way towards deploying safe RL systems in real-world
1237 applications.
1238 </summary></entry><entry><id>http://arxiv.org/abs/2109.09876</id><title>Context-Specific Representation Abstraction for Deep Option Learning</title><updated>2021-09-23T09:06:49.505061+00:00</updated><author><name>Marwa Abdulhai</name></author><author><name>Dong-Ki Kim</name></author><author><name>Matthew Riemer</name></author><author><name>Miao Liu</name></author><author><name>Gerald Tesauro</name></author><author><name>Jonathan P. How</name></author><link href="http://arxiv.org/abs/2109.09876" rel="alternate"/><summary>Hierarchical reinforcement learning has focused on discovering temporally
1239 extended actions, such as options, that can provide benefits in problems
1240 requiring extensive exploration. One promising approach that learns these
1241 options end-to-end is the option-critic (OC) framework. We examine and show in
1242 this paper that OC does not decompose a problem into simpler sub-problems, but
1243 instead increases the size of the search over policy space with each option
1244 considering the entire state space during learning. This issue can result in
1245 practical limitations of this method, including sample inefficient learning. To
1246 address this problem, we introduce Context-Specific Representation Abstraction
1247 for Deep Option Learning (CRADOL), a new framework that considers both temporal
1248 abstraction and context-specific representation abstraction to effectively
1249 reduce the size of the search over policy space. Specifically, our method
1250 learns a factored belief state representation that enables each option to learn
1251 a policy over only a subsection of the state space. We test our method against
1252 hierarchical, non-hierarchical, and modular recurrent neural network baselines,
1253 demonstrating significant sample efficiency improvements in challenging
1254 partially observable environments.
1255 </summary></entry><entry><id>http://arxiv.org/abs/2109.09862</id><title>Language Identification with a Reciprocal Rank Classifier</title><updated>2021-09-23T09:06:49.504540+00:00</updated><author><name>Dominic Widdows</name></author><author><name>Chris Brew</name></author><link href="http://arxiv.org/abs/2109.09862" rel="alternate"/><summary>Language identification is a critical component of language processing
1256 pipelines (Jauhiainen et al.,2019) and is not a solved problem in real-world
1257 settings. We present a lightweight and effective language identifier that is
1258 robust to changes of domain and to the absence of copious training data.
1259 </summary></entry><entry><id>http://arxiv.org/abs/2109.09861</id><title>Generalized dynamic cognitive hierarchy models for strategic driving behavior</title><updated>2021-09-23T09:06:49.504112+00:00</updated><author><name>Atrisha Sarkar</name></author><author><name>Kate Larson</name></author><author><name>Krzysztof Czarnecki</name></author><link href="http://arxiv.org/abs/2109.09861" rel="alternate"/><summary>While there has been an increasing focus on the use of game theoretic models
1260 for autonomous driving, empirical evidence shows that there are still open
1261 questions around dealing with the challenges of common knowledge assumptions as
1262 well as modeling bounded rationality. To address some of these practical
1263 challenges, we develop a framework of generalized dynamic cognitive hierarchy
1264 for both modelling naturalistic human driving behavior as well as behavior
1265 planning for autonomous vehicles (AV). This framework is built upon a rich
1266 model of level-0 behavior through the use of automata strategies, an
1267 interpretable notion of bounded rationality through safety and maneuver
1268 satisficing, and a robust response for planning. Based on evaluation on two
1269 large naturalistic datasets as well as simulation of critical traffic
1270 scenarios, we show that i) automata strategies are well suited for level-0
1271 behavior in a dynamic level-k framework, and ii) the proposed robust response
1272 to a heterogeneous population of strategic and non-strategic reasoners can be
1273 an effective approach for game theoretic planning in AV.
1274 </summary></entry><entry><id>http://arxiv.org/abs/2109.09844</id><title>Assessing clinical utility of Machine Learning and Artificial Intelligence approaches to analyze speech recordings in Multiple Sclerosis: A Pilot Study</title><updated>2021-09-23T09:06:49.503318+00:00</updated><author><name>Emil Svoboda</name></author><author><name>Tomáš Bořil</name></author><author><name>Jan Rusz</name></author><author><name>Tereza Tykalová</name></author><author><name>Dana Horáková</name></author><author><name>Charles R.G. Guttman</name></author><author><name>Krastan B. Blagoev</name></author><author><name>Hiroto Hatabu</name></author><author><name>Vlad I. Valtchinov</name></author><link href="http://arxiv.org/abs/2109.09844" rel="alternate"/><summary>Background: An early diagnosis together with an accurate disease progression
1275 monitoring of multiple sclerosis is an important component of successful
1276 disease management. Prior studies have established that multiple sclerosis is
1277 correlated with speech discrepancies. Early research using objective acoustic
1278 measurements has discovered measurable dysarthria.
1279 </summary></entry><entry><id>http://arxiv.org/abs/2109.09833</id><title>Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics</title><updated>2021-09-23T09:06:49.502515+00:00</updated><author><name>Yixin Wu</name></author><author><name>Rui Luo</name></author><author><name>Chen Zhang</name></author><author><name>Jun Wang</name></author><author><name>Yaodong Yang</name></author><link href="http://arxiv.org/abs/2109.09833" rel="alternate"/><summary>In this paper, we characterize the noise of stochastic gradients and analyze
1280 the noise-induced dynamics during training deep neural networks by
1281 gradient-based optimizers. Specifically, we firstly show that the stochastic
1282 gradient noise possesses finite variance, and therefore the classical Central
1283 Limit Theorem (CLT) applies; this indicates that the gradient noise is
1284 asymptotically Gaussian. Such an asymptotic result validates the wide-accepted
1285 assumption of Gaussian noise. We clarify that the recently observed phenomenon
1286 of heavy tails within gradient noise may not be intrinsic properties, but the
1287 consequence of insufficient mini-batch size; the gradient noise, which is a sum
1288 of limited i.i.d. random variables, has not reached the asymptotic regime of
1289 CLT, thus deviates from Gaussian. We quantitatively measure the goodness of
1290 Gaussian approximation of the noise, which supports our conclusion. Secondly,
1291 we analyze the noise-induced dynamics of stochastic gradient descent using the
1292 Langevin equation, granting for momentum hyperparameter in the optimizer with a
1293 physical interpretation. We then proceed to demonstrate the existence of the
1294 steady-state distribution of stochastic gradient descent and approximate the
1295 distribution at a small learning rate.
1296 </summary></entry><entry><id>http://arxiv.org/abs/2109.09829</id><title>Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework</title><updated>2021-09-23T09:06:49.502052+00:00</updated><author><name>Muhammad Shafique</name></author><author><name>Alberto Marchisio</name></author><author><name>Rachmad Vidya Wicaksana Putra</name></author><author><name>Muhammad Abdullah Hanif</name></author><link href="http://arxiv.org/abs/2109.09829" rel="alternate"/><summary>The security and privacy concerns along with the amount of data that is
1297 required to be processed on regular basis has pushed processing to the edge of
1298 the computing systems. Deploying advanced Neural Networks (NN), such as deep
1299 neural networks (DNNs) and spiking neural networks (SNNs), that offer
1300 state-of-the-art results on resource-constrained edge devices is challenging
1301 due to the stringent memory and power/energy constraints. Moreover, these
1302 systems are required to maintain correct functionality under diverse security
1303 and reliability threats. This paper first discusses existing approaches to
1304 address energy efficiency, reliability, and security issues at different system
1305 layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how to
1306 further improve the performance (latency) and the energy efficiency of Edge AI
1307 systems through HW/SW-level optimizations, such as pruning, quantization, and
1308 approximation. To address reliability threats (like permanent and transient
1309 faults), we highlight cost-effective mitigation techniques, like fault-aware
1310 training and mapping. Moreover, we briefly discuss effective detection and
1311 protection techniques to address security threats (like model and data
1312 corruption). Towards the end, we discuss how these techniques can be combined
1313 in an integrated cross-layer framework for realizing robust and
1314 energy-efficient Edge AI systems.
1315 </summary></entry><entry><id>http://arxiv.org/abs/2109.09825</id><title>Data Augmentation Methods for Anaphoric Zero Pronouns</title><updated>2021-09-23T09:06:49.501641+00:00</updated><author><name>Abdulrahman Aloraini</name></author><author><name>Massimo Poesio</name></author><link href="http://arxiv.org/abs/2109.09825" rel="alternate"/><summary>In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and
1316 many others, unrealized (null) arguments in certain syntactic positions can
1317 refer to a previously introduced entity, and are thus called anaphoric zero
1318 pronouns. The existing resources for studying anaphoric zero pronoun
1319 interpretation are however still limited. In this paper, we use five data
1320 augmentation methods to generate and detect anaphoric zero pronouns
1321 automatically. We use the augmented data as additional training materials for
1322 two anaphoric zero pronoun systems for Arabic. Our experimental results show
1323 that data augmentation improves the performance of the two systems, surpassing
1324 the state-of-the-art results.
1325 </summary></entry><entry><id>http://arxiv.org/abs/2109.09809</id><title>Counterfactual Instances Explain Little</title><updated>2021-09-23T09:06:49.501241+00:00</updated><author><name>Adam White</name></author><author><name>Artur d'Avila Garcez</name></author><link href="http://arxiv.org/abs/2109.09809" rel="alternate"/><summary>In many applications, it is important to be able to explain the decisions of
1326 machine learning systems. An increasingly popular approach has been to seek to
1327 provide \emph{counterfactual instance explanations}. These specify close
1328 possible worlds in which, contrary to the facts, a person receives their
1329 desired decision from the machine learning system. This paper will draw on
1330 literature from the philosophy of science to argue that a satisfactory
1331 explanation must consist of both counterfactual instances and a causal equation
1332 (or system of equations) that support the counterfactual instances. We will
1333 show that counterfactual instances by themselves explain little. We will
1334 further illustrate how explainable AI methods that provide both causal
1335 equations and counterfactual instances can successfully explain machine
1336 learning predictions.
1337 </summary></entry><entry><id>http://arxiv.org/abs/2109.09807</id><title>I Know You Can't See Me: Dynamic Occlusion-Aware Safety Validation of Strategic Planners for Autonomous Vehicles Using Hypergames</title><updated>2021-09-23T09:06:49.500759+00:00</updated><author><name>Maximilian Kahn</name></author><author><name>Atrisha Sarkar</name></author><author><name>Krzysztof Czarnecki</name></author><link href="http://arxiv.org/abs/2109.09807" rel="alternate"/><summary>A particular challenge for both autonomous and human driving is dealing with
1338 risk associated with dynamic occlusion, i.e., occlusion caused by other
1339 vehicles in traffic. Based on the theory of hypergames, we develop a novel
1340 multi-agent dynamic occlusion risk (DOR) measure for assessing situational risk
1341 in dynamic occlusion scenarios. Furthermore, we present a white-box,
1342 scenario-based, accelerated safety validation framework for assessing safety of
1343 strategic planners in AV. Based on evaluation over a large naturalistic
1344 database, our proposed validation method achieves a 4000% speedup compared to
1345 direct validation on naturalistic data, a more diverse coverage, and ability to
1346 generalize beyond the dataset and generate commonly observed dynamic occlusion
1347 crashes in traffic in an automated manner.
1348 </summary></entry><entry><id>http://arxiv.org/abs/2109.09791</id><title>Prediction of severe thunderstorm events with ensemble deep learning and radar data</title><updated>2021-09-23T09:06:49.499746+00:00</updated><author><name>Sabrina Guastavino</name></author><author><name>Michele Piana</name></author><author><name>Marco Tizzi</name></author><author><name>Federico Cassola</name></author><author><name>Antonio Iengo</name></author><author><name>Davide Sacchetti</name></author><author><name>Enrico Solazzo</name></author><author><name>Federico Benvenuto</name></author><link href="http://arxiv.org/abs/2109.09791" rel="alternate"/><summary>The problem of nowcasting extreme weather events can be addressed by applying
1349 either numerical methods for the solution of dynamic model equations or
1350 data-driven artificial intelligence algorithms. Within this latter framework,
1351 the present paper illustrates how a deep learning method, exploiting videos of
1352 radar reflectivity frames as input, can be used to realize a warning machine
1353 able to sound timely alarms of possible severe thunderstorm events. From a
1354 technical viewpoint, the computational core of this approach is the use of a
1355 value-weighted skill score for both transforming the probabilistic outcomes of
1356 the deep neural network into binary classification and assessing the
1357 forecasting performances. The warning machine has been validated against
1358 weather radar data recorded in the Liguria region, in Italy,
1359