latest paper

2022/10/15 survey review 共 773126 字,约 2209 分钟
虚无

on arxiv about survey or review

2022

TOC

2022-07

2022-07-01 02:06:03 - A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks * *Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele* * `2111.04949v2` - [abs](http://arxiv.org/abs/2111.04949v2) - [pdf](http://arxiv.org/pdf/2111.04949v2) > The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. First, we survey current practices in distributed learning and identify the different types of parallelism used. Then, we present empirical results comparing their performance on large image and language training tasks. Additionally, we address their statistical efficiency and memory consumption behavior. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.
2022-07-01 11:53:47 - Time Waits for No One! Analysis and Challenges of Temporal Misalignment - *Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. Smith* - `2111.07408v2` - [abs](http://arxiv.org/abs/2111.07408v2) - [pdf](http://arxiv.org/pdf/2111.07408v2) > When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance. In this work, we establish a suite of eight diverse tasks across different domains (social media, science papers, news, and reviews) and periods of time (spanning five years or more) to quantify the effects of temporal misalignment. Our study is focused on the ubiquitous setting where a pretrained model is optionally adapted through continued domain-specific pretraining, followed by task-specific finetuning. We establish a suite of tasks across multiple domains to study temporal misalignment in modern NLP systems. We find stronger effects of temporal misalignment on task performance than have been previously reported. We also find that, while temporal adaptation through continued pretraining can help, these gains are small compared to task-specific finetuning on data from the target time period. Our findings motivate continued research to improve temporal robustness of NLP models.
2022-07-01 12:17:59 - Image Quality Assessment for Magnetic Resonance Imaging - *Segrey Kastryulin, Jamil Zakirov, Nicola Pezzotti, Dmitry V. Dylov* - `2203.07809v2` - [abs](http://arxiv.org/abs/2203.07809v2) - [pdf](http://arxiv.org/pdf/2203.07809v2) > Image quality assessment (IQA) algorithms aim to reproduce the human's perception of the image quality. The growing popularity of image enhancement, generation, and recovery models instigated the development of many methods to assess their performance. However, most IQA solutions are designed to predict image quality in the general domain, with the applicability to specific areas, such as medical imaging, remaining questionable. Moreover, the selection of these IQA metrics for a specific task typically involves intentionally induced distortions, such as manually added noise or artificial blurring; yet, the chosen metrics are then used to judge the output of real-life computer vision models. In this work, we aspire to fill these gaps by carrying out the most extensive IQA evaluation study for Magnetic Resonance Imaging (MRI) to date (14,700 subjective scores). We use outputs of neural network models trained to solve problems relevant to MRI, including image reconstruction in the scan acceleration, motion correction, and denoising. Our emphasis is on reflecting the radiologist's perception of the reconstructed images, gauging the most diagnostically influential criteria for the quality of MRI scans: signal-to-noise ratio, contrast-to-noise ratio, and the presence of artifacts. Seven trained radiologists assess these distorted images, with their verdicts then correlated with 35 different image quality metrics (full-reference, no-reference, and distribution-based metrics considered). The top performers -- DISTS, HaarPSI, VSI, and FID-VGG16 -- are found to be efficient across three proposed quality criteria, for all considered anatomies and the target tasks.
2022-07-01 12:36:40 - Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration - *Henna Kokkonen, Lauri Lovén, Naser Hossein Motlagh, Juha Partala, Alfonso González-Gil, Ester Sola, Iñigo Angulo, Madhusanka Liyanage, Teemu Leppänen, Tri Nguyen, Víctor Casamayor Pujol, Panos Kostakos, Mehdi Bennis, Sasu Tarkoma, Schahram Dustdar, Susanna Pirttikangas, Jukka Riekki* - `2205.01423v2` - [abs](http://arxiv.org/abs/2205.01423v2) - [pdf](http://arxiv.org/pdf/2205.01423v2) > Future AI applications require performance, reliability and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.
2022-07-01 13:32:21 - Multi-Document Keyphrase Extraction: Dataset, Baselines and Review - *Ori Shapira, Ramakanth Pasunuru, Ido Dagan, Yael Amsterdamer* - `2110.01073v2` - [abs](http://arxiv.org/abs/2110.01073v2) - [pdf](http://arxiv.org/pdf/2110.01073v2) > Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task.
2022-07-01 23:31:41 - Explainability in Graph Neural Networks: A Taxonomic Survey - *Hao Yuan, Haiyang Yu, Shurui Gui, Shuiwang Ji* - `2012.15445v3` - [abs](http://arxiv.org/abs/2012.15445v3) - [pdf](http://arxiv.org/pdf/2012.15445v3) > Deep learning methods are achieving ever-increasing performance on many artificial intelligence tasks. A major limitation of deep models is that they are not amenable to interpretability. This limitation can be circumvented by developing post hoc techniques to explain the predictions, giving rise to the area of explainability. Recently, explainability of deep models on images and texts has achieved significant progress. In the area of graph data, graph neural networks (GNNs) and their explainability are experiencing rapid developments. However, there is neither a unified treatment of GNN explainability methods, nor a standard benchmark and testbed for evaluations. In this survey, we provide a unified and taxonomic view of current GNN explainability methods. Our unified and taxonomic treatments of this subject shed lights on the commonalities and differences of existing methods and set the stage for further methodological developments. To facilitate evaluations, we generate a set of benchmark graph datasets specifically for GNN explainability. We summarize current datasets and metrics for evaluating GNN explainability. Altogether, this work provides a unified methodological treatment of GNN explainability and a standardized testbed for evaluations.
2022-07-02 08:14:12 - Natural Language Processing in-and-for Design Research - *L Siddharth, Lucienne T. M. Blessing, Jianxi Luo* - `2111.13827v3` - [abs](http://arxiv.org/abs/2111.13827v3) - [pdf](http://arxiv.org/pdf/2111.13827v3) > We review the scholarly contributions that utilise Natural Language Processing (NLP) techniques to support the design process. Using a heuristic approach, we gathered 223 articles that are published in 32 journals within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research.
2022-07-02 09:31:37 - Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review - *Hao Wang, Bin Guo, Yating Zeng, Yasan Ding, Chen Qiu, Ying Zhang, Lina Yao, Zhiwen Yu* - `2207.00782v1` - [abs](http://arxiv.org/abs/2207.00782v1) - [pdf](http://arxiv.org/pdf/2207.00782v1) > The intelligent dialogue system, aiming at communicating with humans harmoniously with natural language, is brilliant for promoting the advancement of human-machine interaction in the era of artificial intelligence. With the gradually complex human-computer interaction requirements (e.g., multimodal inputs, time sensitivity), it is difficult for traditional text-based dialogue system to meet the demands for more vivid and convenient interaction. Consequently, Visual Context Augmented Dialogue System (VAD), which has the potential to communicate with humans by perceiving and understanding multimodal information (i.e., visual context in images or videos, textual dialogue history), has become a predominant research paradigm. Benefiting from the consistency and complementarity between visual and textual context, VAD possesses the potential to generate engaging and context-aware responses. For depicting the development of VAD, we first characterize the concepts and unique features of VAD, and then present its generic system architecture to illustrate the system workflow. Subsequently, several research challenges and representative works are detailed investigated, followed by the summary of authoritative benchmarks. We conclude this paper by putting forward some open issues and promising research trends for VAD, e.g., the cognitive mechanisms of human-machine dialogue under cross-modal dialogue context, and knowledge-enhanced cross-modal semantic interaction.
2022-07-02 15:47:33 - The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial - *Travis LaCroix* - `2207.00868v1` - [abs](http://arxiv.org/abs/2207.00868v1) - [pdf](http://arxiv.org/pdf/2207.00868v1) > The value-alignment problem for artificial intelligence (AI) asks how we can ensure that the 'values' (i.e., objective functions) of artificial systems are aligned with the values of humanity. In this paper, I argue that linguistic communication (natural language) is a necessary condition for robust value alignment. I discuss the consequences that the truth of this claim would have for research programmes that attempt to ensure value alignment for AI systems; or, more loftily, designing robustly beneficial or ethical artificial agents.
2022-07-02 16:13:13 - Neural Networks for Path Planning - *Salim Janji, Adrian Kliks* - `2207.00874v1` - [abs](http://arxiv.org/abs/2207.00874v1) - [pdf](http://arxiv.org/pdf/2207.00874v1) > The scientific community is able to present a new set of solutions to practical problems that substantially improve the performance of modern technology in terms of efficiency and speed of computation due to the advancement in neural networks architectures. We present the latest works considering the utilization of neural networks in robot path planning. Our survey shows the contrast between different formulations of the problems that consider different inputs, outputs, and environments and how different neural networks architectures are able to provide solutions to all of the presented problems.
2022-07-02 17:01:29 - From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN) - *Xin Ye, Yezhou Yang* - `2002.11310v3` - [abs](http://arxiv.org/abs/2002.11310v3) - [pdf](http://arxiv.org/pdf/2002.11310v3) > Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods. Due to the innate complexity of this task, researchers have tried approaching the problem from a variety of different angles, the full scope of which has not yet been captured within an overarching report. This survey first summarizes the representative work of learning-based approaches for the VIN task and then identifies and discusses lingering issues impeding the VIN performance, as well as motivates future research in these key areas worth exploring for the community.
2022-07-02 22:37:22 - Meta Learning for Natural Language Processing: A Survey - *Hung-yi Lee, Shang-Wen Li, Ngoc Thang Vu* - `2205.01500v2` - [abs](http://arxiv.org/abs/2205.01500v2) - [pdf](http://arxiv.org/pdf/2205.01500v2) > Deep learning has been the mainstream technique in natural language processing (NLP) area. However, the techniques require many labeled data and are less generalizable across domains. Meta-learning is an arising field in machine learning studying approaches to learn better learning algorithms. Approaches aim at improving algorithms in various aspects, including data efficiency and generalizability. Efficacy of approaches has been shown in many NLP tasks, but there is no systematic survey of these approaches in NLP, which hinders more researchers from joining the field. Our goal with this survey paper is to offer researchers pointers to relevant meta-learning works in NLP and attract more attention from the NLP community to drive future innovation. This paper first introduces the general concepts of meta-learning and the common approaches. Then we summarize task construction settings and application of meta-learning for various NLP problems and review the development of meta-learning in NLP community.
2022-07-03 02:37:31 - Using Hashtags to Analyze Purpose and Technology Application of Open-Source Project Related to COVID-19 - *Liang Tian, Chengzhi Zhang* - `2207.06219v1` - [abs](http://arxiv.org/abs/2207.06219v1) - [pdf](http://arxiv.org/pdf/2207.06219v1) > COVID-19 has had a profound impact on the lives of all human beings. Emerging technologies have made significant contributions to the fight against the pandemic. An extensive review of the application of technology will help facilitate future research and technology development to provide better solutions for future pandemics. In contrast to the extensive surveys of academic communities that have already been conducted, this study explores the IT community of practice. Using GitHub as the study target, we analyzed the main functionalities of the projects submitted during the pandemic. This study examines trends in projects with different functionalities and the relationship between functionalities and technologies. The study results show an imbalance in the number of projects with varying functionalities in the GitHub community, i.e., applications account for more than half of the projects. In contrast, other data analysis and AI projects account for a smaller share. This differs significantly from the survey of the academic community, where the findings focus more on cutting-edge technologies while projects in the community of practice use more mature technologies. The spontaneous behavior of developers may lack organization and make it challenging to target needs.
2022-07-03 02:57:22 - An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics - *Huan Yee Koh, Jiaxin Ju, Ming Liu, Shirui Pan* - `2207.00939v1` - [abs](http://arxiv.org/abs/2207.00939v1) - [pdf](http://arxiv.org/pdf/2207.00939v1) > Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.
2022-07-03 10:35:18 - Supervised learning for improving the accuracy of robot-mounted 3D camera applied to human gait analysis - *Diego Guffanti, Alberto Brunete, Miguel Hernando, David Álvarez, Javier Rueda, Enrique Navarro* - `2207.01002v1` - [abs](http://arxiv.org/abs/2207.01002v1) - [pdf](http://arxiv.org/pdf/2207.01002v1) > The use of 3D cameras for gait analysis has been highly questioned due to the low accuracy they have demonstrated in the past. The objective of the study presented in this paper is to improve the accuracy of the estimations made by robot-mounted 3D cameras in human gait analysis by applying a supervised learning stage. The 3D camera was mounted in a mobile robot to obtain a longer walking distance. This study shows an improvement in detection of kinematic gait signals and gait descriptors by post-processing the raw estimations of the camera using artificial neural networks trained with the data obtained from a certified Vicon system. To achieve this, 37 healthy participants were recruited and data of 207 gait sequences were collected using an Orbbec Astra 3D camera. There are two basic possible approaches for training: using kinematic gait signals and using gait descriptors. The former seeks to improve the waveforms of kinematic gait signals by reducing the error and increasing the correlation with respect to the Vicon system. The second is a more direct approach, focusing on training the artificial neural networks using gait descriptors directly. The accuracy of the 3D camera was measured before and after training. In both training approaches, an improvement was observed. Kinematic gait signals showed lower errors and higher correlations with respect to the ground truth. The accuracy of the system to detect gait descriptors also showed a substantial improvement, mostly for kinematic descriptors rather than spatio-temporal. When comparing both training approaches, it was not possible to define which was the absolute best. Therefore, we believe that the selection of the training approach will depend on the purpose of the study to be conducted. This study reveals the great potential of 3D cameras and encourages the research community to continue exploring their use in gait analysis.
2022-07-04 06:21:01 - A Survey on Label-efficient Deep Segmentation: Bridging the Gap between Weak Supervision and Dense Prediction - *Wei Shen, Zelin Peng, Xuehui Wang, Huayu Wang, Jiazhong Cen, Dongsheng Jiang, Lingxi Xie, Xiaokang Yang, Qi Tian* - `2207.01223v1` - [abs](http://arxiv.org/abs/2207.01223v1) - [pdf](http://arxiv.org/pdf/2207.01223v1) > The rapid development of deep learning has made a great progress in segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based segmentation algorithms. This paper offers a comprehensive review on label-efficient segmentation methods. To this end, we first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels (including no supervision, coarse supervision, incomplete supervision and noisy supervision) and supplemented by the types of segmentation problems (including semantic segmentation, instance segmentation and panoptic segmentation). Next, we summarize the existing label-efficient segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, cross-image relation, etc. Finally, we share our opinions about the future research directions for label-efficient deep segmentation.
2022-07-04 08:23:28 - Is it possible not to cheat on the Turing Test_Exploring the potential and challenges for true natural language 'understanding' by computers - *Lize Alberts* - `2206.14672v3` - [abs](http://arxiv.org/abs/2206.14672v3) - [pdf](http://arxiv.org/pdf/2206.14672v3) > The increasing sophistication of NLP models has renewed optimism regarding machines achieving a full human-like command of natural language. Whilst work in NLP/NLU may have made great strides in that direction, the lack of conceptual clarity in how 'understanding' is used in this and other disciplines have made it difficult to discern how close we actually are. A critical, interdisciplinary review of current approaches and remaining challenges is yet to be carried out. Beyond linguistic knowledge, this requires considering our species-specific capabilities to categorize, memorize, label and communicate our (sufficiently similar) embodied and situated experiences. Moreover, gauging the practical constraints requires critically analyzing the technical capabilities of current models, as well as deeper philosophical reflection on theoretical possibilities and limitations. In this paper, I unite all of these perspectives -- the philosophical, cognitive-linguistic, and technical -- to unpack the challenges involved in approaching true (human-like) language understanding. By unpacking the theoretical assumptions inherent in current approaches, I hope to illustrate how far we actually are from achieving this goal, if indeed it is the goal.
2022-07-04 08:37:27 - Feasibility Study of Neural ODE and DAE Modules for Power System Dynamic Component Modeling - *Tannan Xiao, Ying Chen, Shaowei Huang, Tirui He, Huizhe Guan* - `2110.12981v5` - [abs](http://arxiv.org/abs/2110.12981v5) - [pdf](http://arxiv.org/pdf/2110.12981v5) > In the context of high penetration of renewables, the need to build dynamic models of power system components based on accessible measurement data has become urgent. To address this challenge, firstly, a neural ordinary differential equations (ODE) module and a neural differential-algebraic equations (DAE) module are proposed to form a data-driven modeling framework that accurately captures components' dynamic characteristics and flexibly adapts to various interface settings. Secondly, analytical models and data-driven models learned by the neural ODE and DAE modules are integrated together and simulated simultaneously using unified transient stability simulation methods. Finally, the neural ODE and DAE modules are implemented with Python and made public on GitHub. Using the portal measurements, three simple but representative cases of excitation controller modeling, photovoltaic power plant modeling, and equivalent load modeling of a regional power network are carried out in the IEEE-39 system and 2383wp system. Neural dynamic model-integrated simulations are compared with the original model-based ones to verify the feasibility and potentiality of the proposed neural ODE and DAE modules.
2022-07-04 08:40:52 - CV 3315 Is All You Need : Semantic Segmentation Competition - *Akide Liu, Zihan Wang* - `2206.12571v2` - [abs](http://arxiv.org/abs/2206.12571v2) - [pdf](http://arxiv.org/pdf/2206.12571v2) > This competition focus on Urban-Sense Segmentation based on the vehicle camera view. Class highly unbalanced Urban-Sense images dataset challenge the existing solutions and further studies. Deep Conventional neural network-based semantic segmentation methods such as encoder-decoder architecture and multi-scale and pyramid-based approaches become flexible solutions applicable to real-world applications. In this competition, we mainly review the literature and conduct experiments on transformer-driven methods especially SegFormer, to achieve an optimal trade-off between performance and efficiency. For example, SegFormer-B0 achieved 74.6% mIoU with the smallest FLOPS, 15.6G, and the largest model, SegFormer- B5 archived 80.2% mIoU. According to multiple factors, including individual case failure analysis, individual class performance, training pressure and efficiency estimation, the final candidate model for the competition is SegFormer- B2 with 50.6 GFLOPS and 78.5% mIoU evaluated on the testing set. Checkout our code implementation at https://vmv.re/cv3315.
2022-07-04 09:54:50 - Generating Sparse Counterfactual Explanations For Multivariate Time Series - *Jana Lang, Martin Giese, Winfried Ilg, Sebastian Otte* - `2206.00931v2` - [abs](http://arxiv.org/abs/2206.00931v2) - [pdf](http://arxiv.org/pdf/2206.00931v2) > Since neural networks play an increasingly important role in critical sectors, explaining network predictions has become a key research topic. Counterfactual explanations can help to understand why classifier models decide for particular class assignments and, moreover, how the respective input samples would have to be modified such that the class prediction changes. Previous approaches mainly focus on image and tabular data. In this work we propose SPARCE, a generative adversarial network (GAN) architecture that generates SPARse Counterfactual Explanations for multivariate time series. Our approach provides a custom sparsity layer and regularizes the counterfactual loss function in terms of similarity, sparsity, and smoothness of trajectories. We evaluate our approach on real-world human motion datasets as well as a synthetic time series interpretability benchmark. Although we make significantly sparser modifications than other approaches, we achieve comparable or better performance on all metrics. Moreover, we demonstrate that our approach predominantly modifies salient time steps and features, leaving non-salient inputs untouched.
2022-07-04 10:29:20 - Multilingual Disinformation Detection for Digital Advertising - *Zofia Trstanova, Nadir El Manouzi, Maryline Chen, Andre L. V. da Cunha, Sergei Ivanov* - `2207.10649v1` - [abs](http://arxiv.org/abs/2207.10649v1) - [pdf](http://arxiv.org/pdf/2207.10649v1) > In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.
2022-07-04 13:14:37 - GAN-based generation of realistic 3D data: A systematic review and taxonomy - *André Ferreira, Jianning Li, Kelsey L. Pomykala, Jens Kleesiek, Victor Alves, Jan Egger* - `2207.01390v1` - [abs](http://arxiv.org/abs/2207.01390v1) - [pdf](http://arxiv.org/pdf/2207.01390v1) > Data has become the most valuable resource in today's world. With the massive proliferation of data-driven algorithms, such as deep learning-based approaches, the availability of data is of great interest. In this context, high-quality training, validation and testing datasets are particularly needed. Volumetric data is a very important resource in medicine, as it ranges from disease diagnoses to therapy monitoring. When the dataset is sufficient, models can be trained to help doctors with these tasks. Unfortunately, there are scenarios and applications where large amounts of data is unavailable. For example, in the medical field, rare diseases and privacy issues can lead to restricted data availability. In non-medical fields, the high cost of obtaining a sufficient amount of high-quality data can also be a concern. A solution to these problems can be the generation of synthetic data to perform data augmentation in combination with other more traditional methods of data augmentation. Therefore, most of the publications on 3D Generative Adversarial Networks (GANs) are within the medical domain. The existence of mechanisms to generate realistic synthetic data is a good asset to overcome this challenge, especially in healthcare, as the data must be of good quality and close to reality, i.e. realistic, and without privacy issues. In this review, we provide a summary of works that generate realistic 3D synthetic data using GANs. We therefore outline GAN-based methods in these areas with common architectures, advantages and disadvantages. We present a novel taxonomy, evaluations, challenges and research opportunities to provide a holistic overview of the current state of GANs in medicine and other fields.
2022-07-04 14:51:59 - Physics-informed compressed sensing for PC-MRI: an inverse Navier-Stokes problem - *Alexandros Kontogiannis, Matthew P. Juniper* - `2207.01466v1` - [abs](http://arxiv.org/abs/2207.01466v1) - [pdf](http://arxiv.org/pdf/2207.01466v1) > We formulate a physics-informed compressed sensing (PICS) method for the reconstruction of velocity fields from noisy and sparse phase-contrast magnetic resonance signals. The method solves an inverse Navier-Stokes boundary value problem, which permits us to jointly reconstruct and segment the velocity field, and at the same time infer hidden quantities such as the hydrodynamic pressure and the wall shear stress. Using a Bayesian framework, we regularize the problem by introducing a priori information about the unknown parameters in the form of Gaussian random fields. This prior information is updated using the Navier-Stokes problem, an energy-based segmentation functional, and by requiring that the reconstruction is consistent with the $k$-space signals. We create an algorithm that solves this reconstruction problem, and test it for noisy and sparse $k$-space signals of the flow through a converging nozzle. We find that the method is capable of reconstructing and segmenting the velocity fields from sparsely-sampled (15% $k$-space coverage), low ($\sim$$10$) signal-to-noise ratio (SNR) signals, and that the reconstructed velocity field compares well with that derived from fully-sampled (100% $k$-space coverage) high ($>40$) SNR signals of the same flow.
2022-07-04 18:55:10 - Automating the Design and Development of Gradient Descent Trained Expert System Networks - *Jeremy Straub* - `2207.02845v1` - [abs](http://arxiv.org/abs/2207.02845v1) - [pdf](http://arxiv.org/pdf/2207.02845v1) > Prior work introduced a gradient descent trained expert system that conceptually combines the learning capabilities of neural networks with the understandability and defensible logic of an expert system. This system was shown to be able to learn patterns from data and to perform decision-making at levels rivaling those reported by neural network systems. The principal limitation of the approach, though, was the necessity for the manual development of a rule-fact network (which is then trained using backpropagation). This paper proposes a technique for overcoming this significant limitation, as compared to neural networks. Specifically, this paper proposes the use of larger and denser-than-application need rule-fact networks which are trained, pruned, manually reviewed and then re-trained for use. Multiple types of networks are evaluated under multiple operating conditions and these results are presented and assessed. Based on these individual experimental condition assessments, the proposed technique is evaluated. The data presented shows that error rates as low as 3.9% (mean, 1.2% median) can be obtained, demonstrating the efficacy of this technique for many applications.
2022-07-04 19:25:15 - Location reference recognition from texts: A survey and comparison - *Xuke Hu, Zhiyong Zhou, Hao Li, Yingjie Hu, Fuqiang Gu, Jens Kersten, Hongchao Fan, Friederike Klan* - `2207.01683v1` - [abs](http://arxiv.org/abs/2207.01683v1) - [pdf](http://arxiv.org/pdf/2207.01683v1) > A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to the process of recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of the specific applications is still missing. Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching-based, statistical learning-based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references across the world. Results from this thorough evaluation can help inform future methodological developments for location reference recognition, and can help guide the selection of proper approaches based on application needs.
2022-07-05 05:10:59 - Deriving Surface Resistivity from Polarimetric SAR Data Using Dual-Input UNet - *Bibin Wilson, Rajiv Kumar, Narayanarao Bhogapurapu, Anand Singh, Amit Sethi* - `2207.01811v1` - [abs](http://arxiv.org/abs/2207.01811v1) - [pdf](http://arxiv.org/pdf/2207.01811v1) > Traditional survey methods for finding surface resistivity are time-consuming and labor intensive. Very few studies have focused on finding the resistivity/conductivity using remote sensing data and deep learning techniques. In this line of work, we assessed the correlation between surface resistivity and Synthetic Aperture Radar (SAR) by applying various deep learning methods and tested our hypothesis in the Coso Geothermal Area, USA. For detecting the resistivity, L-band full polarimetric SAR data acquired by UAVSAR were used, and MT (Magnetotellurics) inverted resistivity data of the area were used as the ground truth. We conducted experiments to compare various deep learning architectures and suggest the use of Dual Input UNet (DI-UNet) architecture. DI-UNet uses a deep learning architecture to predict the resistivity using full polarimetric SAR data by promising a quick survey addition to the traditional method. Our proposed approach accomplished improved outcomes for the mapping of MT resistivity from SAR data.
2022-07-05 07:26:09 - Creativity and Machine Learning: A Survey - *Giorgio Franceschelli, Mirco Musolesi* - `2104.02726v3` - [abs](http://arxiv.org/abs/2104.02726v3) - [pdf](http://arxiv.org/pdf/2104.02726v3) > There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field.
2022-07-05 07:43:44 - MReD: A Meta-Review Dataset for Structure-Controllable Text Generation - *Chenhui Shen, Liying Cheng, Ran Zhou, Lidong Bing, Yang You, Luo Si* - `2110.07474v6` - [abs](http://arxiv.org/abs/2110.07474v6) - [pdf](http://arxiv.org/pdf/2110.07474v6) > When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited. A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should leverage both the input text and the control signal to guide the generation, which can only be built with a deep understanding of the domain knowledge. Motivated by this vision, our paper introduces a new text generation dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its 45k meta-review sentences are manually annotated with one of the 9 carefully defined categories, including abstract, strength, decision, etc. We present experimental results on start-of-the-art summarization models, and propose methods for structure-controlled generation with both extractive and abstractive models using our annotated data. By exploring various settings and analyzing the model behavior with respect to the control signal, we demonstrate the challenges of our proposed task and the values of our dataset MReD. Meanwhile, MReD also allows us to have a better understanding of the meta-review domain.
2022-07-05 08:38:41 - Recent Deep Semi-supervised Learning Approaches and Related Works - *Gyeongho Kim* - `2106.11528v2` - [abs](http://arxiv.org/abs/2106.11528v2) - [pdf](http://arxiv.org/pdf/2106.11528v2) > The author of this work proposes an overview of the recent semi-supervised learning approaches and related works. Despite the remarkable success of neural networks in various applications, there exist few formidable constraints including the need for a large amount of labeled data. Therefore, semi-supervised learning, which is a learning scheme in which the scarce labels and a larger amount of unlabeled data are utilized to train models (e.g., deep neural networks) is getting more important. Based on the key assumptions of semi-supervised learning, which are the manifold assumption, cluster assumption, and continuity assumption, the work reviews the recent semi-supervised learning approaches. In particular, the methods in regard to using deep neural networks in a semi-supervised learning setting are primarily discussed. In addition, the existing works are first classified based on the underlying idea and explained, and then the holistic approaches that unify the aforementioned ideas are detailed.
2022-07-05 10:44:26 - Making sense of spoken plurals - *Elnaz Shafaei-Bajestan, Peter Uhrig, R. Harald Baayen* - `2207.01947v1` - [abs](http://arxiv.org/abs/2207.01947v1) - [pdf](http://arxiv.org/pdf/2207.01947v1) > Distributional semantics offers new ways to study the semantics of morphology. This study focuses on the semantics of noun singulars and their plural inflectional variants in English. Our goal is to compare two models for the conceptualization of plurality. One model (FRACSS) proposes that all singular-plural pairs should be taken into account when predicting plural semantics from singular semantics. The other model (CCA) argues that conceptualization for plurality depends primarily on the semantic class of the base word. We compare the two models on the basis of how well the speech signal of plural tokens in a large corpus of spoken American English aligns with the semantic vectors predicted by the two models. Two measures are employed: the performance of a form-to-meaning mapping and the correlations between form distances and meaning distances. Results converge on a superior alignment for CCA. Our results suggest that usage-based approaches to pluralization in which a given word's own semantic neighborhood is given priority outperform theories according to which pluralization is conceptualized as a process building on high-level abstraction. We see that what has often been conceived of as a highly abstract concept, [+plural], is better captured via a family of mid-level partial generalizations.
2022-07-05 12:44:54 - Towards trustworthy Energy Disaggregation: A review of challenges, methods and perspectives for Non-Intrusive Load Monitoring - *Maria Kaselimi, Eftychios Protopapadakis, Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis* - `2207.02009v1` - [abs](http://arxiv.org/abs/2207.02009v1) - [pdf](http://arxiv.org/pdf/2207.02009v1) > Non-intrusive load monitoring (NILM) is the task of disaggregating the total power consumption into its individual sub-components. Over the years, signal processing and machine learning algorithms have been combined to achieve this. A lot of publications and extensive research works are performed on energy disaggregation or NILM for the state-of-the-art methods to reach on the desirable performance. The initial interest of the scientific community to formulate and describe mathematically the NILM problem using machine learning tools has now shifted into a more practical NILM. Nowadays, we are in the mature NILM period where there is an attempt for NILM to be applied in real-life application scenarios. Thus, complexity of the algorithms, transferability, reliability, practicality and in general trustworthiness are the main issues of interest. This review narrows the gap between the early immature NILM era and the mature one. In particular, the paper provides a comprehensive literature review of the NILM methods for residential appliances only. The paper analyzes, summarizes and presents the outcomes of a large number of recently published scholarly articles. Also, the paper discusses the highlights of these methods and introduces the research dilemmas that should be taken into consideration by researchers to apply NILM methods. Finally, we show the need for transferring the traditional disaggregation models into a practical and trustworthy framework.
2022-07-05 14:13:22 - Image Amodal Completion: A Survey - *Jiayang Ao, Krista A. Ehinger, Qiuhong Ke* - `2207.02062v1` - [abs](http://arxiv.org/abs/2207.02062v1) - [pdf](http://arxiv.org/pdf/2207.02062v1) > Existing computer vision systems can compete with humans in understanding the visible parts of objects, but still fall far short of humans when it comes to depicting the invisible parts of partially occluded objects. Image amodal completion aims to equip computers with human-like amodal completion functions to understand an intact object despite it being partially occluded. The main purpose of this survey is to provide an intuitive understanding of the research hotspots, key technologies and future trends in the field of image amodal completion. Firstly, we present a comprehensive review of the latest literature in this emerging field, exploring three key tasks in image amodal completion, including amodal shape completion, amodal appearance completion, and order perception. Then we examine popular datasets related to image amodal completion along with their common data collection methods and evaluation metrics. Finally, we discuss real-world applications and future research directions for image amodal completion, facilitating the reader's understanding of the challenges of existing technologies and upcoming research trends.
2022-07-05 15:08:12 - Generating Game Levels of Diverse Behaviour Engagement - *Keyuan Zhang, Jiayu Bai, Jialin Liu* - `2207.02100v1` - [abs](http://arxiv.org/abs/2207.02100v1) - [pdf](http://arxiv.org/pdf/2207.02100v1) > Recent years, there has been growing interests in experience-driven procedural level generation. Various metrics have been formulated to model player experience and help generate personalised levels. In this work, we question whether experience metrics can adapt to agents with different personas. We start by reviewing existing metrics for evaluating game levels. Then, focusing on platformer games, we design a framework integrating various agents and evaluation metrics. Experimental studies on \emph{Super Mario Bros.} indicate that using the same evaluation metrics but agents with different personas can generate levels for particular persona. It implies that, for simple games, using a game-playing agent of specific player archetype as a level tester is probably all we need to generate levels of diverse behaviour engagement.
2022-07-05 16:14:36 - Deep Learning for Finger Vein Recognition: A Brief Survey of Recent Trend - *Renye Zhang, Yimin Yin, Wanxia Deng, Chen Li, Jinghua Zhang* - `2207.02148v1` - [abs](http://arxiv.org/abs/2207.02148v1) - [pdf](http://arxiv.org/pdf/2207.02148v1) > Finger vein image recognition technology plays an important role in biometric recognition and has been successfully applied in many fields. Because veins are buried beneath the skin tissue, finger vein image recognition has an unparalleled advantage, which is not easily disturbed by external factors. This review summarizes 46 papers about deep learning for finger vein image recognition from 2017 to 2021. These papers are summarized according to the tasks of deep neural networks. Besides, we present the challenges and potential development directions of finger vein image recognition.
2022-07-05 16:28:47 - A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks - *Israa Khalaf Salman Al-Tameemi, Mohammad-Reza Feizi-Derakhshi, Saeed Pashazadeh, Mohammad Asadpour* - `2207.02160v1` - [abs](http://arxiv.org/abs/2207.02160v1) - [pdf](http://arxiv.org/pdf/2207.02160v1) > Social media networks have become a significant aspect of people's lives, serving as a platform for their ideas, opinions and emotions. Consequently, automated sentiment analysis (SA) is critical for recognising people's feelings in ways that other information sources cannot. The analysis of these feelings revealed various applications, including brand evaluations, YouTube film reviews and healthcare applications. As social media continues to develop, people post a massive amount of information in different forms, including text, photos, audio and video. Thus, traditional SA algorithms have become limited, as they do not consider the expressiveness of other modalities. By including such characteristics from various material sources, these multimodal data streams provide new opportunities for optimising the expected results beyond text-based SA. Our study focuses on the forefront field of multimodal SA, which examines visual and textual data posted on social media networks. Many people are more likely to utilise this information to express themselves on these platforms. To serve as a resource for academics in this rapidly growing field, we introduce a comprehensive overview of textual and visual SA, including data pre-processing, feature extraction techniques, sentiment benchmark datasets, and the efficacy of multiple classification methodologies suited to each field. We also provide a brief introduction of the most frequently utilised data fusion strategies and a summary of existing research on visual-textual SA. Finally, we highlight the most significant challenges and investigate several important sentiment applications.
2022-07-05 18:15:28 - Video-based Surgical Skills Assessment using Long term Tool Tracking - *Mona Fathollahi, Mohammad Hasan Sarhan, Ramon Pena, Lela DiMonte, Anshu Gupta, Aishani Ataliwala, Jocelyn Barker* - `2207.02247v1` - [abs](http://arxiv.org/abs/2207.02247v1) - [pdf](http://arxiv.org/pdf/2207.02247v1) > Mastering the technical skills required to perform surgery is an extremely challenging task. Video-based assessment allows surgeons to receive feedback on their technical skills to facilitate learning and development. Currently, this feedback comes primarily from manual video review, which is time-intensive and limits the feasibility of tracking a surgeon's progress over many cases. In this work, we introduce a motion-based approach to automatically assess surgical skills from surgical case video feed. The proposed pipeline first tracks surgical tools reliably to create motion trajectories and then uses those trajectories to predict surgeon technical skill levels. The tracking algorithm employs a simple yet effective re-identification module that improves ID-switch compared to other state-of-the-art methods. This is critical for creating reliable tool trajectories when instruments regularly move on- and off-screen or are periodically obscured. The motion-based classification model employs a state-of-the-art self-attention transformer network to capture short- and long-term motion patterns that are essential for skill evaluation. The proposed method is evaluated on an in-vivo (Cholec80) dataset where an expert-rated GOALS skill assessment of the Calot Triangle Dissection is used as a quantitative skill measure. We compare transformer-based skill assessment with traditional machine learning approaches using the proposed and state-of-the-art tracking. Our result suggests that using motion trajectories from reliable tracking methods is beneficial for assessing surgeon skills based solely on video streams.
2022-07-05 18:23:20 - Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning - *Lukas Schäfer, Filippos Christianos, Amos Storkey, Stefano V. Albrecht* - `2207.02249v1` - [abs](http://arxiv.org/abs/2207.02249v1) - [pdf](http://arxiv.org/pdf/2207.02249v1) > Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.
2022-07-05 19:31:05 - Deep Learning Serves Traffic Safety Analysis: A Forward-looking Review - *Abolfazl Razi, Xiwen Chen, Huayu Li, Hao Wang, Brendan Russo, Yan Chen, Hongbin Yu* - `2203.10939v2` - [abs](http://arxiv.org/abs/2203.10939v2) - [pdf](http://arxiv.org/pdf/2203.10939v2) > This paper explores Deep Learning (DL) methods that are used or have the potential to be used for traffic video analysis, emphasizing driving safety for both Autonomous Vehicles (AVs) and human-operated vehicles. We present a typical processing pipeline, which can be used to understand and interpret traffic videos by extracting operational safety metrics and providing general hints and guidelines to improve traffic safety. This processing framework includes several steps, including video enhancement, video stabilization, semantic and incident segmentation, object detection and classification, trajectory extraction, speed estimation, event analysis, modeling and anomaly detection. Our main goal is to guide traffic analysts to develop their own custom-built processing frameworks by selecting the best choices for each step and offering new designs for the lacking modules by providing a comparative analysis of the most successful conventional and DL-based algorithms proposed for each step. We also review existing open-source tools and public datasets that can help train DL models. To be more specific, we review exemplary traffic problems and mentioned requires steps for each problem. Besides, we investigate connections to the closely related research areas of drivers' cognition evaluation, Crowd-sourcing-based monitoring systems, Edge Computing in roadside infrastructures, Automated Driving Systems (ADS)-equipped vehicles, and highlight the missing gaps. Finally, we review commercial implementations of traffic monitoring systems, their future outlook, and open problems and remaining challenges for widespread use of such systems.
2022-07-05 22:07:26 - Federated and Transfer Learning: A Survey on Adversaries and Defense Mechanisms - *Ehsan Hallaji, Roozbeh Razavi-Far, Mehrdad Saif* - `2207.02337v1` - [abs](http://arxiv.org/abs/2207.02337v1) - [pdf](http://arxiv.org/pdf/2207.02337v1) > The advent of federated learning has facilitated large-scale data exchange amongst machine learning models while maintaining privacy. Despite its brief history, federated learning is rapidly evolving to make wider use more practical. One of the most significant advancements in this domain is the incorporation of transfer learning into federated learning, which overcomes fundamental constraints of primary federated learning, particularly in terms of security. This chapter performs a comprehensive survey on the intersection of federated and transfer learning from a security point of view. The main goal of this study is to uncover potential vulnerabilities and defense mechanisms that might compromise the privacy and performance of systems that use federated and transfer learning.
2022-07-06 00:56:06 - A Comprehensive Review on Deep Supervision: Theories and Applications - *Renjie Li, Xinyi Wang, Guan Huang, Wenli Yang, Kaining Zhang, Xiaotong Gu, Son N. Tran, Saurabh Garg, Jane Alty, Quan Bai* - `2207.02376v1` - [abs](http://arxiv.org/abs/2207.02376v1) - [pdf](http://arxiv.org/pdf/2207.02376v1) > Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network. This technique has been increasingly applied in deep neural network learning systems for various computer vision applications recently. There is a consensus that deep supervision helps improve neural network performance by alleviating the gradient vanishing problem, as one of the many strengths of deep supervision. Besides, in different computer vision applications, deep supervision can be applied in different ways. How to make the most use of deep supervision to improve network performance in different applications has not been thoroughly investigated. In this paper, we provide a comprehensive in-depth review of deep supervision in both theories and applications. We propose a new classification of different deep supervision networks, and discuss advantages and limitations of current deep supervision networks in computer vision applications.
2022-07-06 06:03:10 - Brain-inspired probabilistic generative model for double articulation analysis of spoken language - *Akira Taniguchi, Maoko Muro, Hiroshi Yamakawa, Tadahiro Taniguchi* - `2207.02457v1` - [abs](http://arxiv.org/abs/2207.02457v1) - [pdf](http://arxiv.org/pdf/2207.02457v1) > The human brain, among its several functions, analyzes the double articulation structure in spoken language, i.e., double articulation analysis (DAA). A hierarchical structure in which words are connected to form a sentence and words are composed of phonemes or syllables is called a double articulation structure. Where and how DAA is performed in the human brain has not been established, although some insights have been obtained. In addition, existing computational models based on a probabilistic generative model (PGM) do not incorporate neuroscientific findings, and their consistency with the brain has not been previously discussed. This study compared, mapped, and integrated these existing computational models with neuroscientific findings to bridge this gap, and the findings are relevant for future applications and further research. This study proposes a PGM for a DAA hypothesis that can be realized in the brain based on the outcomes of several neuroscientific surveys. The study involved (i) investigation and organization of anatomical structures related to spoken language processing, and (ii) design of a PGM that matches the anatomy and functions of the region of interest. Therefore, this study provides novel insights that will be foundational to further exploring DAA in the brain.
2022-07-06 06:37:31 - An F-shape Click Model for Information Retrieval on Multi-block Mobile Pages - *Jianghao Lin, Lingyue Fu, Weiwen Liu, Ruiming Tang, Weinan Zhang, Rui Zhang, Yong Yu* - `2206.08604v2` - [abs](http://arxiv.org/abs/2206.08604v2) - [pdf](http://arxiv.org/pdf/2206.08604v2) > To provide click simulation or relevance estimation based on users' implicit interaction feedback, click models have been much studied during recent years. Most click models focus on user behaviors towards a single list. However, with the development of user interface (UI) design, the layout of displayed items on a result page tends to be multi-block (i.e., multi-list) style instead of a single list, which requires different assumptions to model user behaviors more accurately. There exist click models for multi-block pages in desktop contexts, but they cannot be directly applied to mobile scenarios due to different interaction manners, result types and especially multi-block presentation styles. In particular, multi-block mobile pages can normally be decomposed into interleavings of basic vertical blocks and horizontal blocks, thus resulting in typically F-shape forms. To mitigate gaps between desktop and mobile contexts for multi-block pages, we conduct a user eye-tracking study, and identify users' sequential browsing, block skip and comparison patterns on F-shape pages. These findings lead to the design of a novel F-shape Click Model (FSCM), which serves as a general solution to multi-block mobile pages. Firstly, we construct a directed acyclic graph (DAG) for each page, where each item is regarded as a vertex and each edge indicates the user's possible examination flow. Secondly, we propose DAG-structured GRUs and a comparison module to model users' sequential (sequential browsing, block skip) and non-sequential (comparison) behaviors respectively. Finally, we combine GRU states and comparison patterns to perform user click predictions. Experiments on a large-scale real-world dataset validate the effectiveness of FSCM on user behavior predictions compared with baseline models.
2022-07-06 18:00:00 - Tensor networks in machine learning - *Richik Sengupta, Soumik Adhikary, Ivan Oseledets, Jacob Biamonte* - `2207.02851v1` - [abs](http://arxiv.org/abs/2207.02851v1) - [pdf](http://arxiv.org/pdf/2207.02851v1) > A tensor network is a type of decomposition used to express and approximate large arrays of data. A given data-set, quantum state or higher dimensional multi-linear map is factored and approximated by a composition of smaller multi-linear maps. This is reminiscent to how a Boolean function might be decomposed into a gate array: this represents a special case of tensor decomposition, in which the tensor entries are replaced by 0, 1 and the factorisation becomes exact. The collection of associated techniques are called, tensor network methods: the subject developed independently in several distinct fields of study, which have more recently become interrelated through the language of tensor networks. The tantamount questions in the field relate to expressability of tensor networks and the reduction of computational overheads. A merger of tensor networks with machine learning is natural. On the one hand, machine learning can aid in determining a factorization of a tensor network approximating a data set. On the other hand, a given tensor network structure can be viewed as a machine learning model. Herein the tensor network parameters are adjusted to learn or classify a data-set. In this survey we recover the basics of tensor networks and explain the ongoing effort to develop the theory of tensor networks in machine learning.
2022-07-06 19:50:39 - Towards Transparency in Dermatology Image Datasets with Skin Tone Annotations by Experts, Crowds, and an Algorithm - *Matthew Groh, Caleb Harris, Roxana Daneshjou, Omar Badri, Arash Koochek* - `2207.02942v1` - [abs](http://arxiv.org/abs/2207.02942v1) - [pdf](http://arxiv.org/pdf/2207.02942v1) > While artificial intelligence (AI) holds promise for supporting healthcare providers and improving the accuracy of medical diagnoses, a lack of transparency in the composition of datasets exposes AI models to the possibility of unintentional and avoidable mistakes. In particular, public and private image datasets of dermatological conditions rarely include information on skin color. As a start towards increasing transparency, AI researchers have appropriated the use of the Fitzpatrick skin type (FST) from a measure of patient photosensitivity to a measure for estimating skin tone in algorithmic audits of computer vision applications including facial recognition and dermatology diagnosis. In order to understand the variability of estimated FST annotations on images, we compare several FST annotation methods on a diverse set of 460 images of skin conditions from both textbooks and online dermatology atlases. We find the inter-rater reliability between three board-certified dermatologists is comparable to the inter-rater reliability between the board-certified dermatologists and two crowdsourcing methods. In contrast, we find that the Individual Typology Angle converted to FST (ITA-FST) method produces annotations that are significantly less correlated with the experts' annotations than the experts' annotations are correlated with each other. These results demonstrate that algorithms based on ITA-FST are not reliable for annotating large-scale image datasets, but human-centered, crowd-based protocols can reliably add skin type transparency to dermatology datasets. Furthermore, we introduce the concept of dynamic consensus protocols with tunable parameters including expert review that increase the visibility of crowdwork and provide guidance for future crowdsourced annotations of large image datasets.
2022-07-06 20:08:00 - DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification - *Aleksandra Ćiprijanović, Diana Kafkes, Gregory Snyder, F. Javier Sánchez, Gabriel Nathan Perdue, Kevin Pedro, Brian Nord, Sandeep Madireddy, Stefan M. Wild* - `2112.14299v3` - [abs](http://arxiv.org/abs/2112.14299v3) - [pdf](http://arxiv.org/pdf/2112.14299v3) > With increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations.
2022-07-07 02:01:56 - Vision Transformers: State of the Art and Research Challenges - *Bo-Kai Ruan, Hong-Han Shuai, Wen-Huang Cheng* - `2207.03041v1` - [abs](http://arxiv.org/abs/2207.03041v1) - [pdf](http://arxiv.org/pdf/2207.03041v1) > Transformers have achieved great success in natural language processing. Due to the powerful capability of self-attention mechanism in transformers, researchers develop the vision transformers for a variety of computer vision tasks, such as image recognition, object detection, image segmentation, pose estimation, and 3D reconstruction. This paper presents a comprehensive overview of the literature on different architecture designs and training tricks (including self-supervised learning) for vision transformers. Our goal is to provide a systematic review with the open research opportunities.
2022-07-07 04:49:21 - Word Embedding for Social Sciences: An Interdisciplinary Survey - *Akira Matsui, Emilio Ferrara* - `2207.03086v1` - [abs](http://arxiv.org/abs/2207.03086v1) - [pdf](http://arxiv.org/pdf/2207.03086v1) > To extract essential information from complex data, computer scientists have been developing machine learning models that learn low-dimensional representation mode. From such advances in machine learning research, not only computer scientists but also social scientists have benefited and advanced their research because human behavior or social phenomena lies in complex data. To document this emerging trend, we survey the recent studies that apply word embedding techniques to human behavior mining, building a taxonomy to illustrate the methods and procedures used in the surveyed papers and highlight the recent emerging trends applying word embedding models to non-textual human behavior data. This survey conducts a simple experiment to warn that common similarity measurements used in the literature could yield different results even if they return consistent results at an aggregate level.
2022-07-07 08:35:04 - A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges - *Mohammadreza Salehi, Hossein Mirzaei, Dan Hendrycks, Yixuan Li, Mohammad Hossein Rohban, Mohammad Sabokrou* - `2110.14051v4` - [abs](http://arxiv.org/abs/2110.14051v4) - [pdf](http://arxiv.org/pdf/2110.14051v4) > Machine learning models often encounter samples that are diverged from the training distribution. Failure to recognize an out-of-distribution (OOD) sample, and consequently assign that sample to an in-class label significantly compromises the reliability of a model. The problem has gained significant attention due to its importance for safety deploying models in open-world settings. Detecting OOD samples is challenging due to the intractability of modeling all possible unknown distributions. To date, several research domains tackle the problem of detecting unfamiliar samples, including anomaly detection, novelty detection, one-class learning, open set recognition, and out-of-distribution detection. Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently. Accordingly, these research avenues have not cross-pollinated, creating research barriers. While some surveys intend to provide an overview of these approaches, they seem to only focus on a specific domain without examining the relationship between different domains. This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities. Researchers can benefit from the overview of research advances in different fields and develop future methodology synergistically. Furthermore, to the best of our knowledge, while there are surveys in anomaly detection or one-class learning, there is no comprehensive or up-to-date survey on out-of-distribution detection, which our survey covers extensively. Finally, having a unified cross-domain perspective, we discuss and shed light on future lines of research, intending to bring these fields closer together.
2022-07-07 12:52:04 - Resources for Turkish Natural Language Processing: A critical survey - *Çağrı Çöltekin, A. Seza Doğruöz, Özlem Çetinoğlu* - `2204.05042v2` - [abs](http://arxiv.org/abs/2204.05042v2) - [pdf](http://arxiv.org/pdf/2204.05042v2) > This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications in Turkish Linguistics and Natural Language Processing.
2022-07-07 14:32:45 - Adposition and Case Supersenses v2.6: Guidelines for English - *Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Sarah R. Moeller, Omri Abend, Adi Shalev, Austin Blodgett, Jakob Prange* - `1704.02134v8` - [abs](http://arxiv.org/abs/1704.02134v8) - [pdf](http://arxiv.org/pdf/1704.02134v8) > This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018), an inventory of 52 semantic labels ("supersenses") that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE corpus (https://github.com/nert-nlp/streusle/ ; version 4.5 tracks guidelines version 2.6). Though the SNACS inventory aspires to be universal, this document is specific to English; documentation for other languages will be published separately. Version 2 is a revision of the supersense inventory proposed for English by Schneider et al. (2015, 2016) (henceforth "v1"), which in turn was based on previous schemes. The present inventory was developed after extensive review of the v1 corpus annotations for English, plus previously unanalyzed genitive case possessives (Blodgett and Schneider, 2018), as well as consideration of adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et al. (2017) present the theoretical underpinnings of the v2 scheme. Schneider et al. (2018) summarize the scheme, its application to English corpus data, and an automatic disambiguation task. Liu et al. (2021) offer an English Lexical Semantic Recognition tagger that includes SNACS labels in its output. This documentation can also be browsed alongside corpus data on the Xposition website (Gessler et al., 2022): http://www.xposition.org/
2022-07-07 15:52:24 - Can Language Models perform Abductive Commonsense Reasoning? - *Seungone Kim* - `2207.05155v1` - [abs](http://arxiv.org/abs/2207.05155v1) - [pdf](http://arxiv.org/pdf/2207.05155v1) > Abductive Reasoning is a task of inferring the most plausible hypothesis given a set of observations. In literature, the community has approached to solve this challenge by classifying/generating a likely hypothesis that does not contradict with a past observation and future observation. Some of the most well-known benchmarks that tackle this problem are aNLI and aNLG (pronounced as alpha-NLI and alpha-NLG). In this report, I review over some of the methodologies that were attempted to solve this challenge, re-implement the baseline models, and analyze some of the weaknesses that current approaches have. The code and the re-implemented results are available at this link.
2022-07-07 17:20:15 - Fairness and Bias in Robot Learning - *Laura Londoño, Juana Valeria Hurtado, Nora Hertz, Philipp Kellmeyer, Silja Voeneky, Abhinav Valada* - `2207.03444v1` - [abs](http://arxiv.org/abs/2207.03444v1) - [pdf](http://arxiv.org/pdf/2207.03444v1) > Machine learning has significantly enhanced the abilities of robots, enabling them to perform a wide range of tasks in human environments and adapt to our uncertain real world. Recent works in various domains of machine learning have highlighted the importance of accounting for fairness to ensure that these algorithms do not reproduce human biases and consequently lead to discriminatory outcomes. With robot learning systems increasingly performing more and more tasks in our everyday lives, it is crucial to understand the influence of such biases to prevent unintended behavior toward certain groups of people. In this work, we present the first survey on fairness in robot learning from an interdisciplinary perspective spanning technical, ethical, and legal challenges. We propose a taxonomy for sources of bias and the resulting types of discrimination due to them. Using examples from different robot learning domains, we examine scenarios of unfair outcomes and strategies to mitigate them. We present early advances in the field by covering different fairness definitions, ethical and legal considerations, and methods for fair robot learning. With this work, we aim at paving the road for groundbreaking developments in fair robot learning.
2022-07-08 01:40:11 - SETSum: Summarization and Visualization of Student Evaluations of Teaching - *Yinuo Hu, Shiyue Zhang, Viji Sathy, A. T. Panter, Mohit Bansal* - `2207.03640v1` - [abs](http://arxiv.org/abs/2207.03640v1) - [pdf](http://arxiv.org/pdf/2207.03640v1) > Student Evaluations of Teaching (SETs) are widely used in colleges and universities. Typically SET results are summarized for instructors in a static PDF report. The report often includes summary statistics for quantitative ratings and an unsorted list of open-ended student comments. The lack of organization and summarization of the raw comments hinders those interpreting the reports from fully utilizing informative feedback, making accurate inferences, and designing appropriate instructional improvements. In this work, we introduce a novel system, SETSum, that leverages sentiment analysis, aspect extraction, summarization, and visualization techniques to provide organized illustrations of SET findings to instructors and other reviewers. Ten university professors from diverse departments serve as evaluators of the system and all agree that SETSum helps them interpret SET results more efficiently; and 6 out of 10 instructors prefer our system over the standard static PDF report (while the remaining 4 would like to have both). This demonstrates that our work holds the potential to reform the SET reporting conventions in the future. Our code is available at https://github.com/evahuyn/SETSum
2022-07-08 08:47:31 - Emotion detection of social data: APIs comparative study - *Bilal Abu-Salih, Mohammad Alhabashneh, Dengya Zhu, Albara Awajan, Yazan Alshamaileh, Bashar Al-Shboul, Mohammad Alshraideh* - `2207.10654v1` - [abs](http://arxiv.org/abs/2207.10654v1) - [pdf](http://arxiv.org/pdf/2207.10654v1) > The development of emotion detection technology has emerged as a highly valuable possibility in the corporate sector due to the nearly limitless uses of this new discipline, particularly with the unceasing propagation of social data. In recent years, the electronic marketplace has witnessed the establishment of a large number of start-up businesses with an almost sole focus on building new commercial and open-source tools and APIs for emotion detection and recognition. Yet, these tools and APIs must be continuously reviewed and evaluated, and their performances should be reported and discussed. There is a lack of research to empirically compare current emotion detection technologies in terms of the results obtained from each model using the same textual dataset. Also, there is a lack of comparative studies that apply benchmark comparison to social data. This study compares eight technologies; IBM Watson NLU, ParallelDots, Symanto-Ekman, Crystalfeel, Text to Emotion, Senpy, Textprobe, and NLP Cloud. The comparison was undertaken using two different datasets. The emotions from the chosen datasets were then derived using the incorporated APIs. The performance of these APIs was assessed using the aggregated scores that they delivered as well as the theoretically proven evaluation metrics such as the micro-average of accuracy, classification error, precision, recall, and f1-score. Lastly, the assessment of these APIs incorporating the evaluation measures is reported and discussed.
2022-07-08 10:30:50 - BF++: a language for general-purpose program synthesis - *Vadim Liventsev, Aki Härmä, Milan Petković* - `2101.09571v6` - [abs](http://arxiv.org/abs/2101.09571v6) - [pdf](http://arxiv.org/pdf/2101.09571v6) > Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neural models, where it is often difficult to incorporate expert knowledge into the models or let experts review and validate the learned decision mechanisms. Knowledge-insertion and model review are important requirements in many applications involving human health and safety. One way to bridge the gap between data and knowledge driven systems is program synthesis: replacing a neural network that outputs decisions with a symbolic program generated by a neural network or by means of genetic programming. We propose a new programming language, BF++, designed specifically for automatic programming of agents in a Partially Observable Markov Decision Process (POMDP) setting and apply neural program synthesis to solve standard OpenAI Gym benchmarks.
2022-07-08 17:32:51 - A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection - *Yu Li, Anisha Parsan, Bill Wang, Penghao Dong, Shanshan Yao, Ruwen Qin* - `2207.04027v1` - [abs](http://arxiv.org/abs/2207.04027v1) - [pdf](http://arxiv.org/pdf/2207.04027v1) > Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model needs to be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share-Split-Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data, adapting the base model to a new inspector requires only a little training data from that inspector, like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1\% in detecting unauthorized ones. Further, the paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human-robot interaction, including worker heterogeneity, worker dynamics, and job heterogeneity.
2022-07-08 18:57:15 - From Symmetry to Geometry: Tractable Nonconvex Problems - *Yuqian Zhang, Qing Qu, John Wright* - `2007.06753v4` - [abs](http://arxiv.org/abs/2007.06753v4) - [pdf](http://arxiv.org/pdf/2007.06753v4) > As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the data or measurement model. Nonlinear models often exhibit symmetries, creating complicated, nonconvex objective landscapes, with multiple equivalent solutions. Nevertheless, simple methods (e.g., gradient descent) often perform surprisingly well in practice. The goal of this survey is to highlight a class of tractable nonconvex problems, which can be understood through the lens of symmetries. These problems exhibit a characteristic geometric structure: local minimizers are symmetric copies of a single "ground truth" solution, while other critical points occur at balanced superpositions of symmetric copies of the ground truth, and exhibit negative curvature in directions that break the symmetry. This structure enables efficient methods to obtain global minimizers. We discuss examples of this phenomenon arising from a wide range of problems in imaging, signal processing, and data analysis. We highlight the key role of symmetry in shaping the objective landscape and discuss the different roles of rotational and discrete symmetries. This area is rich with observed phenomena and open problems; we close by highlighting directions for future research.
2022-07-08 19:03:00 - Neuroimaging Feature Extraction using a Neural Network Classifier for Imaging Genetics - *Cédric Beaulac, Sidi Wu, Erin Gibson, Michelle F. Miranda, Jiguo Cao, Leno Rocha, Mirza Faisal Beg, Farouk S. Nathoo* - `2207.10794v1` - [abs](http://arxiv.org/abs/2207.10794v1) - [pdf](http://arxiv.org/pdf/2207.10794v1) > A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward developing solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. Our neuroimaging-genetic pipeline is comprised of image processing, neuroimaging feature extraction and genetic association steps. We propose a neural network classifier for extracting neuroimaging features that are related with disease and a multivariate Bayesian group sparse regression model for genetic association. We compare the predictive power of these features to expert selected features and take a closer look at the SNPs identified with the new neuroimaging features.
2022-07-08 21:31:43 - Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow - *Maarten Sap, Anna Jafarpour, Yejin Choi, Noah A. Smith, James W. Pennebaker, Eric Horvitz* - `2201.02662v2` - [abs](http://arxiv.org/abs/2201.02662v2) - [pdf](http://arxiv.org/pdf/2201.02662v2) > Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge of narrative event flow enables people to weave together a story. However, comparable computational tools to evaluate the flow of events in narratives are limited. We quantify the differences between autobiographical and imagined stories by introducing sequentiality, a measure of narrative flow of events, drawing probabilistic inferences from a cutting-edge large language model (GPT-3). Sequentiality captures the flow of a narrative by comparing the probability of a sentence with and without its preceding story context. We applied our measure to study thousands of diary-like stories, collected from crowdworkers about either a recent remembered experience or an imagined story on the same topic. The results show that imagined stories have higher sequentiality than autobiographical stories and that the sequentiality of autobiographical stories increases when the memories are retold several months later. In pursuit of deeper understandings of how sequentiality measures the flow of narratives, we explore proportions of major and minor events in story sentences, as annotated by crowdworkers. We find that lower sequentiality is associated with higher proportions of major events. The methods and results highlight opportunities to use cutting-edge computational analyses, such as sequentiality, on large corpora of matched imagined and autobiographical stories to investigate the influences of memory and reasoning on language generation processes.
2022-07-09 00:02:08 - A Survey of Task-Based Machine Learning Content Extraction Services for VIDINT - *Joshua Brunk, Nathan Jermann, Ryan Sharp, Carl D. Hoover* - `2207.04158v1` - [abs](http://arxiv.org/abs/2207.04158v1) - [pdf](http://arxiv.org/pdf/2207.04158v1) > This paper provides a comparison of current video content extraction tools with a focus on comparing commercial task-based machine learning services. Video intelligence (VIDINT) data has become a critical intelligence source in the past decade. The need for AI-based analytics and automation tools to extract and structure content from video has quickly become a priority for organizations needing to search, analyze and exploit video at scale. With rapid growth in machine learning technology, the maturity of machine transcription, machine translation, topic tagging, and object recognition tasks are improving at an exponential rate, breaking performance records in speed and accuracy as new applications evolve. Each section of this paper reviews and compares products, software resources and video analytics capabilities based on tasks relevant to extracting information from video with machine learning techniques.
2022-07-09 01:38:15 - A Systematic Review and Thematic Analysis of Community-Collaborative Approaches to Computing Research - *Ned Cooper, Tiffanie Horne, Gillian Hayes, Courtney Heldreth, Michal Lahav, Jess Scon Holbrook, Lauren Wilcox* - `2207.04171v1` - [abs](http://arxiv.org/abs/2207.04171v1) - [pdf](http://arxiv.org/pdf/2207.04171v1) > HCI researchers have been gradually shifting attention from individual users to communities when engaging in research, design, and system development. However, our field has yet to establish a cohesive, systematic understanding of the challenges, benefits, and commitments of community-collaborative approaches to research. We conducted a systematic review and thematic analysis of 47 computing research papers discussing participatory research with communities for the development of technological artifacts and systems, published over the last two decades. From this review, we identified seven themes associated with the evolution of a project: from establishing community partnerships to sustaining results. Our findings suggest that several tensions characterize these projects, many of which relate to the power and position of researchers, and the computing research environment, relative to community partners. We discuss the implications of our findings and offer methodological proposals to guide HCI, and computing research more broadly, towards practices that center communities.
2022-07-09 03:49:51 - Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification - *Lin Wu, Lingqiao Liu, Yang Wang, Zheng Zhang, Farid Boussaid, Mohammed Bennamoun* - `2207.13037v1` - [abs](http://arxiv.org/abs/2207.13037v1) - [pdf](http://arxiv.org/pdf/2207.13037v1) > The cross-resolution person re-identification (CRReID) problem aims to match low-resolution (LR) query identity images against high resolution (HR) gallery images. It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. To address this problem, state-of-the-art (SOTA) solutions either learn the resolution-invariant representation or adopt super-resolution (SR) module to recover the missing information from the LR query. This paper explores an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric, which is adaptive to the resolution of a query image. We realize this idea by learning resolution-adaptive representations for cross-resolution comparison. Specifically, we propose two resolution-adaptive mechanisms. The first one disentangles the resolution-specific information into different sub-vectors in the penultimate layer of the deep neural networks, and thus creates a varying-length representation. To better extract resolution-dependent information, we further propose to learn resolution-adaptive masks for intermediate residual feature blocks. A novel progressive learning strategy is proposed to train those masks properly. These two mechanisms are combined to boost the performance of CRReID. Experimental results show that the proposed method is superior to existing approaches and achieves SOTA performance on multiple CRReID benchmarks.
2022-07-09 04:05:06 - Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification - *Lin Wu, Deyin Liu, Wenying Zhang, Dapeng Chen, Zongyuan Ge, Farid Boussaid, Mohammed Bennamoun, Jialie Shen* - `2207.13035v1` - [abs](http://arxiv.org/abs/2207.13035v1) - [pdf](http://arxiv.org/pdf/2207.13035v1) > Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations. Unlike conventional unsupervised re-ID methods that use pseudo labels based on global clustering, we construct patch surrogate classes as initial supervision, and propose to assign pseudo labels to images through the pairwise gradient-guided similarity separation. This can cluster images in pseudo pairs, and the pseudos can be updated during training. Based on pseudo pairs, we propose to improve the generalization of similarity function via a novel self-similarity learning:it learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity. The intra-similarity learning is based on channel attention to detect diverse local features from an image. The inter-similarity learning employs a deformable convolution with a non-local block to align patches for cross-image similarity. Experimental results on several re-ID benchmark datasets demonstrate the superiority of the proposed method over the state-of-the-arts.
2022-07-09 11:08:06 - Knowledge-aware Document Summarization: A Survey of Knowledge, Embedding Methods and Architectures - *Yutong Qu, Wei Emma Zhang, Jian Yang, Lingfei Wu, Jia Wu* - `2204.11190v2` - [abs](http://arxiv.org/abs/2204.11190v2) - [pdf](http://arxiv.org/pdf/2204.11190v2) > Knowledge-aware methods have boosted a range of natural language processing applications over the last decades. With the gathered momentum, knowledge recently has been pumped into enormous attention in document summarization, one of natural language processing applications. Previous works reported that knowledge-embedded document summarizers excel at generating superior digests, especially in terms of informativeness, coherence, and fact consistency. This paper pursues to present the first systematic survey for the state-of-the-art methodologies that embed knowledge into document summarizers. Particularly, we propose novel taxonomies to recapitulate knowledge and knowledge embeddings under the document summarization view. We further explore how embeddings are generated in embedding learning architectures of document summarization models, especially of deep learning models. At last, we discuss the challenges of this topic and future directions.
2022-07-09 22:12:17 - Automated Identification of Toxic Code Reviews Using ToxiCR - *Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, Amiangshu Bosu* - `2202.13056v2` - [abs](http://arxiv.org/abs/2202.13056v2) - [pdf](http://arxiv.org/pdf/2202.13056v2) > Toxic conversations during software development interactions may have serious repercussions on a Free and Open Source Software (FOSS) development project. For example, victims of toxic conversations may become afraid to express themselves, therefore get demotivated, and may eventually leave the project. Automated filtering of toxic conversations may help a FOSS community to maintain healthy interactions among its members. However, off-the-shelf toxicity detectors perform poorly on Software Engineering (SE) dataset, such as one curated from code review comments. To encounter this challenge, we present ToxiCR, a supervised learning-based toxicity identification tool for code review interactions. ToxiCR includes a choice to select one of the ten supervised learning algorithms, an option to select text vectorization techniques, eight preprocessing steps, and a large scale labeled dataset of 19,571 code review comments. Two out of those eight preprocessing steps are SE domain specific. With our rigorous evaluation of the models with various combinations of preprocessing steps and vectorization techniques, we have identified the best combination for our dataset that boosts 95.8% accuracy and 88.9% F1 score. ToxiCR significantly outperforms existing toxicity detectors on our dataset. We have released our dataset, pretrained models, evaluation results, and source code publicly available at: https://github.com/WSU-SEAL/ToxiCR
2022-07-09 23:19:58 - Recent, rapid advancement in visual question answering architecture: a review - *Venkat Kodali, Daniel Berleant* - `2203.01322v4` - [abs](http://arxiv.org/abs/2203.01322v4) - [pdf](http://arxiv.org/pdf/2203.01322v4) > Understanding visual question answering is going to be crucial for numerous human activities. However, it presents major challenges at the heart of the artificial intelligence endeavor. This paper presents an update on the rapid advancements in visual question answering using images that have occurred in the last couple of years. Tremendous growth in research on improving visual question answering system architecture has been published recently, showing the importance of multimodal architectures. Several points on the benefits of visual question answering are mentioned in the review paper by Manmadhan et al. (2020), on which the present article builds, including subsequent updates in the field.
2022-07-10 18:04:05 - DecisioNet: A Binary-Tree Structured Neural Network - *Noam Gottlieb, Michael Werman* - `2207.01127v3` - [abs](http://arxiv.org/abs/2207.01127v3) - [pdf](http://arxiv.org/pdf/2207.01127v3) > Deep neural networks (DNNs) and decision trees (DTs) are both state-of-the-art classifiers. DNNs perform well due to their representational learning capabilities, while DTs are computationally efficient as they perform inference along one route (root-to-leaf) that is dependent on the input data. In this paper, we present DecisioNet (DN), a binary-tree structured neural network. We propose a systematic way to convert an existing DNN into a DN to create a lightweight version of the original model. DecisioNet takes the best of both worlds - it uses neural modules to perform representational learning and utilizes its tree structure to perform only a portion of the computations. We evaluate various DN architectures, along with their corresponding baseline models on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We show that the DN variants achieve similar accuracy while significantly reducing the computational cost of the original network.
2022-07-10 21:42:00 - Point Label Aware Superpixels for Multi-species Segmentation of Underwater Imagery - *Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Tobias Fischer* - `2202.13487v2` - [abs](http://arxiv.org/abs/2202.13487v2) - [pdf](http://arxiv.org/pdf/2202.13487v2) > Monitoring coral reefs using underwater vehicles increases the range of marine surveys and availability of historical ecological data by collecting significant quantities of images. Analysis of this imagery can be automated using a model trained to perform semantic segmentation, however it is too costly and time-consuming to densely label images for training supervised models. In this letter, we leverage photo-quadrat imagery labeled by ecologists with sparse point labels. We propose a point label aware method for propagating labels within superpixel regions to obtain augmented ground truth for training a semantic segmentation model. Our point label aware superpixel method utilizes the sparse point labels, and clusters pixels using learned features to accurately generate single-species segments in cluttered, complex coral images. Our method outperforms prior methods on the UCSD Mosaics dataset by 3.62% for pixel accuracy and 8.35% for mean IoU for the label propagation task, while reducing computation time reported by previous approaches by 76%. We train a DeepLabv3+ architecture and outperform state-of-the-art for semantic segmentation by 2.91% for pixel accuracy and 9.65% for mean IoU on the UCSD Mosaics dataset and by 4.19% for pixel accuracy and 14.32% mean IoU for the Eilat dataset.
2022-07-10 22:12:00 - Depth Perspective-aware Multiple Object Tracking - *Kha Gia Quach, Huu Le, Pha Nguyen, Chi Nhan Duong, Tien Dai Bui, Khoa Luu* - `2207.04551v1` - [abs](http://arxiv.org/abs/2207.04551v1) - [pdf](http://arxiv.org/pdf/2207.04551v1) > This paper aims to tackle Multiple Object Tracking (MOT), an important problem in computer vision but remains challenging due to many practical issues, especially occlusions. Indeed, we propose a new real-time Depth Perspective-aware Multiple Object Tracking (DP-MOT) approach to tackle the occlusion problem in MOT. A simple yet efficient Subject-Ordered Depth Estimation (SODE) is first proposed to automatically order the depth positions of detected subjects in a 2D scene in an unsupervised manner. Using the output from SODE, a new Active pseudo-3D Kalman filter, a simple but effective extension of Kalman filter with dynamic control variables, is then proposed to dynamically update the movement of objects. In addition, a new high-order association approach is presented in the data association step to incorporate first-order and second-order relationships between the detected objects. The proposed approach consistently achieves state-of-the-art performance compared to recent MOT methods on standard MOT benchmarks.
2022-07-11 01:24:38 - Making Images Real Again: A Comprehensive Survey on Deep Image Composition - *Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang* - `2106.14490v2` - [abs](http://arxiv.org/abs/2106.14490v2) - [pdf](http://arxiv.org/pdf/2106.14490v2) > As a common image editing operation, image composition aims to cut the foreground from one image and paste it on another image, resulting in a composite image. However, there are many issues that could make the composite images unrealistic. These issues can be summarized as the inconsistency between foreground and background, which includes appearance inconsistency (e.g., incompatible illumination), geometry inconsistency (e.g., unreasonable size), and semantic inconsistency (e.g., mismatched semantic context). Previous works divide image composition task into multiple sub-tasks, in which each sub-task targets at one or more issues. Specifically, object placement aims to find reasonable scale, location, and shape for the foreground. Image blending aims to address the unnatural boundary between foreground and background. Image harmonization aims to adjust the illumination statistics of foreground. Shadow generation aims to generate plausible shadow for the foreground. By putting all the abovementioned efforts together, we can acquire realistic composite images. To the best of our knowledge, there is no previous survey on image composition. In this paper, we conduct comprehensive survey over the sub-tasks of image composition. For each sub-task, we summarize the traditional methods, deep learning based methods, datasets and evaluation. We also point out the limitations of existing methods in each sub-task and the problem of the whole image composition task. Datasets and codes for image composition are summarized at https://github.com/bcmi/Awesome-Image-Composition.
2022-07-11 02:39:34 - A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-Oriented Dialogue Policy Learning - *Wai-Chung Kwan, Hongru Wang, Huimin Wang, Kam-Fai Wong* - `2202.13675v2` - [abs](http://arxiv.org/abs/2202.13675v2) - [pdf](http://arxiv.org/pdf/2202.13675v2) > Dialogue Policy Learning is a key component in a task-oriented dialogue system (TDS) that decides the next action of the system given the dialogue state at each turn. Reinforcement Learning (RL) is commonly chosen to learn the dialogue policy, regarding the user as the environment and the system as the agent. Many benchmark datasets and algorithms have been created to facilitate the development and evaluation of dialogue policy based on RL. In this paper, we survey recent advances and challenges in dialogue policy from the prescriptive of RL. More specifically, we identify the major problems and summarize corresponding solutions for RL-based dialogue policy learning. Besides, we provide a comprehensive survey of applying RL to dialogue policy learning by categorizing recent methods into basic elements in RL. We believe this survey can shed a light on future research in dialogue management.
2022-07-11 11:41:39 - A Survey on Sentiment and Emotion Analysis for Computational Literary Studies - *Evgeny Kim, Roman Klinger* - `1808.03137v4` - [abs](http://arxiv.org/abs/1808.03137v4) - [pdf](http://arxiv.org/pdf/1808.03137v4) > Emotions are a crucial part of compelling narratives: literature tells us about people with goals, desires, passions, and intentions. Emotion analysis is part of the broader and larger field of sentiment analysis, and receives increasing attention in literary studies. In the past, the affective dimension of literature was mainly studied in the context of literary hermeneutics. However, with the emergence of the research field known as Digital Humanities (DH), some studies of emotions in a literary context have taken a computational turn. Given the fact that DH is still being formed as a field, this direction of research can be rendered relatively new. In this survey, we offer an overview of the existing body of research on emotion analysis as applied to literature. The research under review deals with a variety of topics including tracking dramatic changes of a plot development, network analysis of a literary text, and understanding the emotionality of texts, among other topics.
2022-07-11 12:48:50 - Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency - *Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, Marinka Zitnik* - `2206.08496v2` - [abs](http://arxiv.org/abs/2206.08496v2) - [pdf](http://arxiv.org/pdf/2206.08496v2) > Pre-training on time series poses a unique challenge due to the potential mismatch between pre-training and target domains, such as shifts in temporal dynamics, fast-evolving trends, and long-range and short cyclic effects, which can lead to poor downstream performance. While domain adaptation methods can mitigate these shifts, most methods need examples directly from the target domain, making them suboptimal for pre-training. To address this challenge, methods need to accommodate target domains with different temporal dynamics and be capable of doing so without seeing any target examples during pre-training. Relative to other modalities, in time series, we expect that time-based and frequency-based representations of the same example are located close together in the time-frequency space. To this end, we posit that time-frequency consistency (TF-C) -- embedding a time-based neighborhood of a particular example close to its frequency-based neighborhood and back -- is desirable for pre-training. Motivated by TF-C, we define a decomposable pre-training model, where the self-supervised signal is provided by the distance between time and frequency components, each individually trained by contrastive estimation. We evaluate the new method on eight datasets, including electrodiagnostic testing, human activity recognition, mechanical fault detection, and physical status monitoring. Experiments against eight state-of-the-art methods show that TF-C outperforms baselines by 15.4% (F1 score) on average in one-to-one settings (e.g., fine-tuning an EEG-pretrained model on EMG data) and by up to 8.4% (F1 score) in challenging one-to-many settings, reflecting the breadth of scenarios that arise in real-world applications. The source code and datasets are available at https: //anonymous.4open.science/r/TFC-pretraining-6B07.
2022-07-11 16:03:01 - A Survey on Deep Learning-based Single Image Crowd Counting: Network Design, Loss Function and Supervisory Signal - *Haoyue Bai, Jiageng Mao, S. -H. Gary Chan* - `2012.15685v2` - [abs](http://arxiv.org/abs/2012.15685v2) - [pdf](http://arxiv.org/pdf/2012.15685v2) > Single image crowd counting is a challenging computer vision problem with wide applications in public safety, city planning, traffic management, etc. With the recent development of deep learning techniques, crowd counting has aroused much attention and achieved great success in recent years. This survey is to provide a comprehensive summary of recent advances on deep learning-based crowd counting techniques via density map estimation by systematically reviewing and summarizing more than 200 works in the area since 2015. Our goals are to provide an up-to-date review of recent approaches, and educate new researchers in this field the design principles and trade-offs. After presenting publicly available datasets and evaluation metrics, we review the recent advances with detailed comparisons on three major design modules for crowd counting: deep neural network designs, loss functions, and supervisory signals. We study and compare the approaches using the public datasets and evaluation metrics. We conclude the survey with some future directions.
2022-07-11 18:00:43 - Risk-averse autonomous systems: A brief history and recent developments from the perspective of optimal control - *Yuheng Wang, Margaret P. Chapman* - `2109.08947v3` - [abs](http://arxiv.org/abs/2109.08947v3) - [pdf](http://arxiv.org/pdf/2109.08947v3) > We present an historical overview about the connections between the analysis of risk and the control of autonomous systems. We offer two main contributions. Our first contribution is to propose three overlapping paradigms to classify the vast body of literature: the worst-case, risk-neutral, and risk-averse paradigms. We consider an appropriate assessment for the risk of an autonomous system to depend on the application at hand. In contrast, it is typical to assess risk using an expectation, variance, or probability alone. Our second contribution is to unify the concepts of risk and autonomous systems. We achieve this by connecting approaches for quantifying and optimizing the risk that arises from a system's behaviour across academic fields. The survey is highly multidisciplinary. We include research from the communities of reinforcement learning, stochastic and robust control theory, operations research, and formal verification. We describe both model-based and model-free methods, with emphasis on the former. Lastly, we highlight fruitful areas for further research. A key direction is to blend risk-averse model-based and model-free methods to enhance the real-time adaptive capabilities of systems to improve human and environmental welfare.
2022-07-11 18:42:52 - LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech - *Harshvardhan Anand, Nansi Begam, Richa Verma, Sourav Ghosh, Harichandana B. S. S, Sumit Kumar* - `2207.07118v1` - [abs](http://arxiv.org/abs/2207.07118v1) - [pdf](http://arxiv.org/pdf/2207.07118v1) > Existing Text-to-Speech (TTS) systems need to read messages from the email which may have Personal Identifiable Information (PII) to text messages that can have a streak of emojis and punctuation. 92% of the world's online population use emoji with more than 10 billion emojis sent everyday. Lack of preprocessor leads to messages being read as-is including punctuation and infographics like emoticons. This problem worsens if there is a continuous sequence of punctuation/emojis that are quite common in real-world communications like messaging, Social Networking Site (SNS) interactions, etc. In this work, we aim to introduce a lightweight intelligent preprocessor (LIP) that can enhance the readability of a message before being passed downstream to existing TTS systems. We propose multiple sub-modules including: expanding contraction, censoring swear words, and masking of PII, as part of our preprocessor to enhance the readability of text. With a memory footprint of only 3.55 MB and inference time of 4 ms for up to 50-character text, our solution is suitable for real-time deployment. This work being the first of its kind, we try to benchmark with an open independent survey, the result of which shows 76.5% preference towards LIP enabled TTS engine as compared to standard TTS.
2022-07-11 21:55:05 - Modern Views of Machine Learning for Precision Psychiatry - *Zhe Sage Chen, Prathamesh, Kulkarni, Isaac R. Galatzer-Levy, Benedetta Bigio, Carla Nasca, Yu Zhang* - `2204.01607v2` - [abs](http://arxiv.org/abs/2204.01607v2) - [pdf](http://arxiv.org/pdf/2204.01607v2) > In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research.
2022-07-11 22:07:30 - Facial-Sketch Synthesis: A New Challenge - *Deng-Ping Fan, Ziling Huang, Peng Zheng, Hong Liu, Xuebin Qin, Luc Van Gool* - `2112.15439v6` - [abs](http://arxiv.org/abs/2112.15439v6) - [pdf](http://arxiv.org/pdf/2112.15439v6) > This paper aims to conduct a comprehensive study on facial-sketch synthesis (FSS). However, due to the high costs of obtaining hand-drawn sketch datasets, there lacks a complete benchmark for assessing the development of FSS algorithms over the last decade. We first introduce a high-quality dataset for FSS, named FS2K, which consists of 2,104 image-sketch pairs spanning three types of sketch styles, image backgrounds, lighting conditions, skin colors, and facial attributes. FS2K differs from previous FSS datasets in difficulty, diversity, and scalability and should thus facilitate the progress of FSS research. Second, we present the largest-scale FSS investigation by reviewing 89 classical methods, including 25 handcrafted feature-based facial-sketch synthesis approaches, 29 general translation methods, and 35 image-to-sketch approaches. Besides, we elaborate comprehensive experiments on the existing 19 cutting-edge models. Third, we present a simple baseline for FSS, named FSGAN. With only two straightforward components, i.e., facial-aware masking and style-vector expansion, FSGAN surpasses the performance of all previous state-of-the-art models on the proposed FS2K dataset by a large margin. Finally, we conclude with lessons learned over the past years and point out several unsolved challenges. Our code is available at https://github.com/DengPingFan/FSGAN.
2022-07-12 02:44:40 - A Survey on Table Question Answering: Recent Advances - *Nengzheng Jin, Joanna Siebert, Dongfang Li, Qingcai Chen* - `2207.05270v1` - [abs](http://arxiv.org/abs/2207.05270v1) - [pdf](http://arxiv.org/pdf/2207.05270v1) > Table Question Answering (Table QA) refers to providing precise answers from tables to answer a user's question. In recent years, there have been a lot of works on table QA, but there is a lack of comprehensive surveys on this research topic. Hence, we aim to provide an overview of available datasets and representative methods in table QA. We classify existing methods for table QA into five categories according to their techniques, which include semantic-parsing-based, generative, extractive, matching-based, and retriever-reader-based methods. Moreover, as table QA is still a challenging task for existing methods, we also identify and outline several key challenges and discuss the potential future directions of table QA.
2022-07-12 06:51:53 - Face editing with GAN -- A Review - *Parthak Mehta, Sarthak Mishra, Nikhil Chouhan, Neel Pethani, Ishani Saha* - `2207.11227v1` - [abs](http://arxiv.org/abs/2207.11227v1) - [pdf](http://arxiv.org/pdf/2207.11227v1) > In recent years, Generative Adversarial Networks (GANs) have become a hot topic among researchers and engineers that work with deep learning. It has been a ground-breaking technique which can generate new pieces of content of data in a consistent way. The topic of GANs has exploded in popularity due to its applicability in fields like image generation and synthesis, and music production and composition. GANs have two competing neural networks: a generator and a discriminator. The generator is used to produce new samples or pieces of content, while the discriminator is used to recognize whether the piece of content is real or generated. What makes it different from other generative models is its ability to learn unlabeled samples. In this review paper, we will discuss the evolution of GANs, several improvements proposed by the authors and a brief comparison between the different models. Index Terms generative adversarial networks, unsupervised learning, deep learning.
2022-07-12 08:48:32 - Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey - *Cédric Colas, Tristan Karch, Olivier Sigaud, Pierre-Yves Oudeyer* - `2012.09830v7` - [abs](http://arxiv.org/abs/2012.09830v7) - [pdf](http://arxiv.org/pdf/2012.09830v7) > Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by $autotelic$ $agents$: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: $developmental$ $reinforcement$ $learning$. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the $intrinsically$ $motivated$ $acquisition$ $of$ $open$-$ended$ $repertoires$ $of$ $skills$. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.
2022-07-12 10:34:03 - A review of schemes for fingerprint image quality computation - *Fernando Alonso-Fernandez, Julian Fierrez-Aguilar, Javier Ortega-Garcia* - `2207.05449v1` - [abs](http://arxiv.org/abs/2207.05449v1) - [pdf](http://arxiv.org/pdf/2207.05449v1) > Fingerprint image quality affects heavily the performance of fingerprint recognition systems. This paper reviews existing approaches for fingerprint image quality computation. We also implement, test and compare a selection of them using the MCYT database including 9000 fingerprint images. Experimental results show that most of the algorithms behave similarly.
2022-07-12 11:30:34 - On the limits of perceptual quality measures for enhanced underwater images - *Chau Yi Li, Andrea Cavallaro* - `2207.05470v1` - [abs](http://arxiv.org/abs/2207.05470v1) - [pdf](http://arxiv.org/pdf/2207.05470v1) > The appearance of objects in underwater images is degraded by the selective attenuation of light, which reduces contrast and causes a colour cast. This degradation depends on the water environment, and increases with depth and with the distance of the object from the camera. Despite an increasing volume of works in underwater image enhancement and restoration, the lack of a commonly accepted evaluation measure is hindering the progress as it is difficult to compare methods. In this paper, we review commonly used colour accuracy measures, such as colour reproduction error and CIEDE2000, and no-reference image quality measures, such as UIQM, UCIQE and CCF, which have not yet been systematically validated. We show that none of the no-reference quality measures satisfactorily rates the quality of enhanced underwater images and discuss their main shortcomings. Images and results are available at https://puiqe.eecs.qmul.ac.uk.
2022-07-12 11:43:33 - VertXNet: Automatic Segmentation and Identification of Lumbar and Cervical Vertebrae from Spinal X-ray Images - *Yao Chen, Yuanhan Mo, Aimee Readie, Gregory Ligozio, Thibaud Coroller, Bartlomiej W. Papiez* - `2207.05476v1` - [abs](http://arxiv.org/abs/2207.05476v1) - [pdf](http://arxiv.org/pdf/2207.05476v1) > Manual annotation of vertebrae on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by proposing an ensemble method called VertXNet, to automatically segment and label vertebrae in X-ray spinal images. VertXNet combines two state-of-the-art segmentation models, namely U-Net and Mask R-CNN to improve vertebrae segmentation. A main feature of VertXNet is to also infer vertebrae labels thanks to its Mask R-CNN component (trained to detect 'reference' vertebrae) on a given spinal X-ray image. VertXNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray imaging for ankylosing spondylitis (AS) patients. Our results show that VertXNet can accurately label spinal X-rays (mean Dice of 0.9). It can be used to circumvent the lack of annotated vertebrae without requiring human expert review. This step is crucial to investigate clinical associations by solving the lack of segmentation, a common bottleneck for most computational imaging projects.
2022-07-12 12:38:10 - Improving the Robustness and Generalization of Deep Neural Network with Confidence Threshold Reduction - *Xiangyuan Yang, Jie Lin, Hanlin Zhang, Xinyu Yang, Peng Zhao* - `2206.00913v2` - [abs](http://arxiv.org/abs/2206.00913v2) - [pdf](http://arxiv.org/pdf/2206.00913v2) > Deep neural networks are easily attacked by imperceptible perturbation. Presently, adversarial training (AT) is the most effective method to enhance the robustness of the model against adversarial examples. However, because adversarial training solved a min-max value problem, in comparison with natural training, the robustness and generalization are contradictory, i.e., the robustness improvement of the model will decrease the generalization of the model. To address this issue, in this paper, a new concept, namely confidence threshold (CT), is introduced and the reducing of the confidence threshold, known as confidence threshold reduction (CTR), is proven to improve both the generalization and robustness of the model. Specifically, to reduce the CT for natural training (i.e., for natural training with CTR), we propose a mask-guided divergence loss function (MDL) consisting of a cross-entropy loss term and an orthogonal term. The empirical and theoretical analysis demonstrates that the MDL loss improves the robustness and generalization of the model simultaneously for natural training. However, the model robustness improvement of natural training with CTR is not comparable to that of adversarial training. Therefore, for adversarial training, we propose a standard deviation loss function (STD), which minimizes the difference in the probabilities of the wrong categories, to reduce the CT by being integrated into the loss function of adversarial training. The empirical and theoretical analysis demonstrates that the STD based loss function can further improve the robustness of the adversarially trained model on basis of guaranteeing the changeless or slight improvement of the natural accuracy.
2022-07-12 13:02:52 - Equivariance versus Augmentation for Spherical Images - *Jan E. Gerken, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson* - `2202.03990v2` - [abs](http://arxiv.org/abs/2202.03990v2) - [pdf](http://arxiv.org/pdf/2202.03990v2) > We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems. The equivariant spherical networks used in the experiments are available at https://github.com/JanEGerken/sem_seg_s2cnn .
2022-07-12 16:39:54 - Dynamic Gradient Reactivation for Backward Compatible Person Re-identification - *Xiao Pan, Hao Luo, Weihua Chen, Fan Wang, Hao Li, Wei Jiang, Jianming Zhang, Jianyang Gu, Peike Li* - `2207.05658v1` - [abs](http://arxiv.org/abs/2207.05658v1) - [pdf](http://arxiv.org/pdf/2207.05658v1) > We study the backward compatible problem for person re-identification (Re-ID), which aims to constrain the features of an updated new model to be comparable with the existing features from the old model in galleries. Most of the existing works adopt distillation-based methods, which focus on pushing new features to imitate the distribution of the old ones. However, the distillation-based methods are intrinsically sub-optimal since it forces the new feature space to imitate the inferior old feature space. To address this issue, we propose the Ranking-based Backward Compatible Learning (RBCL), which directly optimizes the ranking metric between new features and old features. Different from previous methods, RBCL only pushes the new features to find best-ranking positions in the old feature space instead of strictly alignment, and is in line with the ultimate goal of backward retrieval. However, the sharp sigmoid function used to make the ranking metric differentiable also incurs the gradient vanish issue, therefore stems the ranking refinement during the later period of training. To address this issue, we propose the Dynamic Gradient Reactivation (DGR), which can reactivate the suppressed gradients by adding dynamic computed constant during forward step. To further help targeting the best-ranking positions, we include the Neighbor Context Agents (NCAs) to approximate the entire old feature space during training. Unlike previous works which only test on the in-domain settings, we make the first attempt to introduce the cross-domain settings (including both supervised and unsupervised), which are more meaningful and difficult. The experimental results on all five settings show that the proposed RBCL outperforms previous state-of-the-art methods by large margins under all settings.
2022-07-12 18:27:19 - Exploration in Deep Reinforcement Learning: A Comprehensive Survey - *Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Jianye Hao, Zhaopeng Meng, Peng Liu, Zhen Wang* - `2109.06668v4` - [abs](http://arxiv.org/abs/2109.06668v4) - [pdf](http://arxiv.org/pdf/2109.06668v4) > Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant successes across a wide range of domains, including game AI, autonomous vehicles, robotics, and so on. However, DRL and deep MARL agents are widely known to be sample inefficient that millions of interactions are usually needed even for relatively simple problem settings, thus preventing the wide application and deployment in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how efficiently exploring the environment and collecting informative experiences that could benefit policy learning towards the optimal ones. This problem becomes more challenging in complex environments with sparse rewards, noisy distractions, long horizons, and non-stationary co-learners. In this paper, we conduct a comprehensive survey on existing exploration methods for both single-agent and multi-agent RL. We start the survey by identifying several key challenges to efficient exploration. Beyond the above two main branches, we also include other notable exploration methods with different ideas and techniques. In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks. According to our algorithmic and empirical investigation, we finally summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.
2022-07-12 19:51:47 - Predictive Coding: a Theoretical and Experimental Review - *Beren Millidge, Anil Seth, Christopher L Buckley* - `2107.12979v4` - [abs](http://arxiv.org/abs/2107.12979v4) - [pdf](http://arxiv.org/pdf/2107.12979v4) > Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world. The theory is closely related to the Bayesian brain framework and, over the last two decades, has gained substantial influence in the fields of theoretical and cognitive neuroscience. A large body of research has arisen based on both empirically testing improved and extended theoretical and mathematical models of predictive coding, as well as in evaluating their potential biological plausibility for implementation in the brain and the concrete neurophysiological and psychological predictions made by the theory. Despite this enduring popularity, however, no comprehensive review of predictive coding theory, and especially of recent developments in this field, exists. Here, we provide a comprehensive review both of the core mathematical structure and logic of predictive coding, thus complementing recent tutorials in the literature. We also review a wide range of classic and recent work within the framework, ranging from the neurobiologically realistic microcircuits that could implement predictive coding, to the close relationship between predictive coding and the widely-used backpropagation of error algorithm, as well as surveying the close relationships between predictive coding and modern machine learning techniques.
2022-07-13 00:21:46 - A Review of Generalized Zero-Shot Learning Methods - *Farhad Pourpanah, Moloud Abdar, Yuxuan Luo, Xinlei Zhou, Ran Wang, Chee Peng Lim, Xi-Zhao Wang, Q. M. Jonathan Wu* - `2011.08641v5` - [abs](http://arxiv.org/abs/2011.08641v5) - [pdf](http://arxiv.org/pdf/2011.08641v5) > Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples under the condition that some output classes are unknown during supervised learning. To address this challenging task, GZSL leverages semantic information of the seen (source) and unseen (target) classes to bridge the gap between both seen and unseen classes. Since its introduction, many GZSL models have been formulated. In this review paper, we present a comprehensive review on GZSL. Firstly, we provide an overview of GZSL including the problems and challenges. Then, we introduce a hierarchical categorization for the GZSL methods and discuss the representative methods in each category. In addition, we discuss the available benchmark data sets and applications of GZSL, along with a discussion on the research gaps and directions for future investigations.
2022-07-13 06:25:55 - Developing a Component Comment Extractor from Product Reviews on E-Commerce Sites - *Shogo Anda, Masato Kikuchi, Tadachika Ozono* - `2207.05979v1` - [abs](http://arxiv.org/abs/2207.05979v1) - [pdf](http://arxiv.org/pdf/2207.05979v1) > Consumers often read product reviews to inform their buying decision, as some consumers want to know a specific component of a product. However, because typical sentences on product reviews contain various details, users must identify sentences about components they want to know amongst the many reviews. Therefore, we aimed to develop a system that identifies and collects component and aspect information of products in sentences. Our BERT-based classifiers assign labels referring to components and aspects to sentences in reviews and extract sentences with comments on specific components and aspects. We determined proper labels based for the words identified through pattern matching from product reviews to create the training data. Because we could not use the words as labels, we carefully created labels covering the meanings of the words. However, the training data was imbalanced on component and aspect pairs. We introduced a data augmentation method using WordNet to reduce the bias. Our evaluation demonstrates that the system can determine labels for road bikes using pattern matching, covering more than 88\% of the indicators of components and aspects on e-commerce sites. Moreover, our data augmentation method can improve the-F1-measure on insufficient data from 0.66 to 0.76.
2022-07-13 10:10:30 - A Reinforcement Learning-based Offensive semantics Censorship System for Chatbots - *Shaokang Cai, Dezhi Han, Zibin Zheng, Dun Li, NoelCrespi* - `2207.10569v1` - [abs](http://arxiv.org/abs/2207.10569v1) - [pdf](http://arxiv.org/pdf/2207.10569v1) > The rapid development of artificial intelligence (AI) technology has enabled large-scale AI applications to land in the market and practice. However, while AI technology has brought many conveniences to people in the productization process, it has also exposed many security issues. Especially, attacks against online learning vulnerabilities of chatbots occur frequently. Therefore, this paper proposes a semantics censorship chatbot system based on reinforcement learning, which is mainly composed of two parts: the Offensive semantics censorship model and the semantics purification model. Offensive semantics review can combine the context of user input sentences to detect the rapid evolution of Offensive semantics and respond to Offensive semantics responses. The semantics purification model For the case of chatting robot models, it has been contaminated by large numbers of offensive semantics, by strengthening the offensive reply learned by the learning algorithm, rather than rolling back to the early versions. In addition, by integrating a once-through learning approach, the speed of semantics purification is accelerated while reducing the impact on the quality of replies. The experimental results show that our proposed approach reduces the probability of the chat model generating offensive replies and that the integration of the few-shot learning algorithm improves the training speed rapidly while effectively slowing down the decline in BLEU values.
2022-07-13 10:25:23 - Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification - *Matthias Rottmann, Marco Reese* - `2207.06104v1` - [abs](http://arxiv.org/abs/2207.06104v1) - [pdf](http://arxiv.org/pdf/2207.06104v1) > In this work, we for the first time present a method for detecting label errors in image datasets with semantic segmentation, i.e., pixel-wise class labels. Annotation acquisition for semantic segmentation datasets is time-consuming and requires plenty of human labor. In particular, review processes are time consuming and label errors can easily be overlooked by humans. The consequences are biased benchmarks and in extreme cases also performance degradation of deep neural networks (DNNs) trained on such datasets. DNNs for semantic segmentation yield pixel-wise predictions, which makes detection of label errors via uncertainty quantification a complex task. Uncertainty is particularly pronounced at the transitions between connected components of the prediction. By lifting the consideration of uncertainty to the level of predicted components, we enable the usage of DNNs together with component-level uncertainty quantification for the detection of label errors. We present a principled approach to benchmarking the task of label error detection by dropping labels from the Cityscapes dataset as well from a dataset extracted from the CARLA driving simulator, where in the latter case we have the labels under control. Our experiments show that our approach is able to detect the vast majority of label errors while controlling the number of false label error detections. Furthermore, we apply our method to semantic segmentation datasets frequently used by the computer vision community and present a collection of label errors along with sample statistics.
2022-07-13 11:07:03 - The Free Energy Principle for Perception and Action: A Deep Learning Perspective - *Pietro Mazzaglia, Tim Verbelen, Ozan Çatal, Bart Dhoedt* - `2207.06415v1` - [abs](http://arxiv.org/abs/2207.06415v1) - [pdf](http://arxiv.org/pdf/2207.06415v1) > The free energy principle, and its corollary active inference, constitute a bio-inspired theory that assumes biological agents act to remain in a restricted set of preferred states of the world, i.e., they minimize their free energy. Under this principle, biological agents learn a generative model of the world and plan actions in the future that will maintain the agent in an homeostatic state that satisfies its preferences. This framework lends itself to being realized in silico, as it comprehends important aspects that make it computationally affordable, such as variational inference and amortized planning. In this work, we investigate the tool of deep learning to design and realize artificial agents based on active inference, presenting a deep-learning oriented presentation of the free energy principle, surveying works that are relevant in both machine learning and active inference areas, and discussing the design choices that are involved in the implementation process. This manuscript probes newer perspectives for the active inference framework, grounding its theoretical aspects into more pragmatic affairs, offering a practical guide to active inference newcomers and a starting point for deep learning practitioners that would like to investigate implementations of the free energy principle.
2022-07-13 13:34:35 - Simulation-guided Beam Search for Neural Combinatorial Optimization - *Jinho Choo, Yeong-Dae Kwon, Jihoon Kim, Jeongwoo Jae, André Hottung, Kevin Tierney, Youngjune Gwon* - `2207.06190v1` - [abs](http://arxiv.org/abs/2207.06190v1) - [pdf](http://arxiv.org/pdf/2207.06190v1) > Neural approaches for combinatorial optimization (CO) equip a learning mechanism to discover powerful heuristics for solving complex real-world problems. While neural approaches capable of high-quality solutions in a single shot are emerging, state-of-the-art approaches are often unable to take full advantage of the solving time available to them. In contrast, hand-crafted heuristics perform highly effective search well and exploit the computation time given to them, but contain heuristics that are difficult to adapt to a dataset being solved. With the goal of providing a powerful search procedure to neural CO approaches, we propose simulation-guided beam search (SGBS), which examines candidate solutions within a fixed-width tree search that both a neural net-learned policy and a simulation (rollout) identify as promising. We further hybridize SGBS with efficient active search (EAS), where SGBS enhances the quality of solutions backpropagated in EAS, and EAS improves the quality of the policy used in SGBS. We evaluate our methods on well-known CO benchmarks and show that SGBS significantly improves the quality of the solutions found under reasonable runtime assumptions.
2022-07-13 14:01:13 - A survey on computational spectral reconstruction methods from RGB to hyperspectral imaging - *Jingang Zhang, Runmu Su, Wenqi Ren, Qiang Fu, Felix Heide, Yunfeng Nie* - `2106.15944v2` - [abs](http://arxiv.org/abs/2106.15944v2) - [pdf](http://arxiv.org/pdf/2106.15944v2) > Hyperspectral imaging enables versatile applications due to its competence in capturing abundant spatial and spectral information, which are crucial for identifying substances. However, the devices for acquiring hyperspectral images are expensive and complicated. Therefore, many alternative spectral imaging methods have been proposed by directly reconstructing the hyperspectral information from lower-cost, more available RGB images. We present a thorough investigation of these state-of-the-art spectral reconstruction methods from the widespread RGB images. A systematic study and comparison of more than 25 methods has revealed that most of the data-driven deep learning methods are superior to prior-based methods in terms of reconstruction accuracy and quality despite lower speeds. This comprehensive review can serve as a fruitful reference source for peer researchers, thus further inspiring future development directions in related domains.
2022-07-13 14:07:48 - Explainability in Deep Reinforcement Learning, a Review into Current Methods and Applications - *Thomas Hickling, Abdelhafid Zenati, Nabil Aouf, Phillippa Spencer* - `2207.01911v2` - [abs](http://arxiv.org/abs/2207.01911v2) - [pdf](http://arxiv.org/pdf/2207.01911v2) > The use of Deep Reinforcement Learning (DRL) schemes has increased dramatically since their first introduction in 2015. Though uses in many different applications are being found they still have a problem with the lack of interpretability. This has bread a lack of understanding and trust in the use of DRL solutions from researchers and the general public. To solve this problem the field of explainable artificial intelligence (XAI) has emerged. This is a variety of different methods that look to open the DRL black boxes, they range from the use of interpretable symbolic decision trees to numerical methods like Shapley Values. This review looks at which methods are being used and what applications they are being used. This is done to identify which models are the best suited to each application or if a method is being underutilised.
2022-07-13 14:30:20 - Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution - *Julien Cornebise, Ivan Oršolić, Freddie Kalaitzis* - `2207.06418v1` - [abs](http://arxiv.org/abs/2207.06418v1) - [pdf](http://arxiv.org/pdf/2207.06418v1) > Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStrat dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate nearly 10,000 sqkm of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. High-resolution Airbus imagery is CC BY-NC, while the labels and Sentinel2 imagery are CC BY, and the source code and pre-trained models under BSD. The dataset is available at https://zenodo.org/record/6810792 and the software package at https://github.com/worldstrat/worldstrat .
2022-07-13 14:31:46 - Explainable Intrusion Detection Systems (X-IDS): A Survey of Current Methods, Challenges, and Opportunities - *Subash Neupane, Jesse Ables, William Anderson, Sudip Mittal, Shahram Rahimi, Ioana Banicescu, Maria Seale* - `2207.06236v1` - [abs](http://arxiv.org/abs/2207.06236v1) - [pdf](http://arxiv.org/pdf/2207.06236v1) > The application of Artificial Intelligence (AI) and Machine Learning (ML) to cybersecurity challenges has gained traction in industry and academia, partially as a result of widespread malware attacks on critical systems such as cloud infrastructures and government institutions. Intrusion Detection Systems (IDS), using some forms of AI, have received widespread adoption due to their ability to handle vast amounts of data with a high prediction accuracy. These systems are hosted in the organizational Cyber Security Operation Center (CSoC) as a defense tool to monitor and detect malicious network flow that would otherwise impact the Confidentiality, Integrity, and Availability (CIA). CSoC analysts rely on these systems to make decisions about the detected threats. However, IDSs designed using Deep Learning (DL) techniques are often treated as black box models and do not provide a justification for their predictions. This creates a barrier for CSoC analysts, as they are unable to improve their decisions based on the model's predictions. One solution to this problem is to design explainable IDS (X-IDS). This survey reviews the state-of-the-art in explainable AI (XAI) for IDS, its current challenges, and discusses how these challenges span to the design of an X-IDS. In particular, we discuss black box and white box approaches comprehensively. We also present the tradeoff between these approaches in terms of their performance and ability to produce explanations. Furthermore, we propose a generic architecture that considers human-in-the-loop which can be used as a guideline when designing an X-IDS. Research recommendations are given from three critical viewpoints: the need to define explainability for IDS, the need to create explanations tailored to various stakeholders, and the need to design metrics to evaluate explanations.
2022-07-13 15:35:22 - Human-AI Collaboration in Decision-Making: Beyond Learning to Defer - *Diogo Leitão, Pedro Saleiro, Mário A. T. Figueiredo, Pedro Bizarro* - `2206.13202v2` - [abs](http://arxiv.org/abs/2206.13202v2) - [pdf](http://arxiv.org/pdf/2206.13202v2) > Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements, such as the availability of predictions from humans for every instance or ground-truth labels that are independent from said humans. Furthermore, neither L2D nor alternative approaches tackle fundamental issues of deploying HAIC systems in real-world settings, such as capacity management or dealing with dynamic environments. In this paper, we aim to identify and review these and other limitations, pointing to where opportunities for future research in HAIC may lie.
2022-07-13 17:52:38 - Open set learning with augmented category by exploiting unlabelled data (open-LACU) - *Emile R. Engelbrecht, Johan A. du Preez* - `2002.01368v4` - [abs](http://arxiv.org/abs/2002.01368v4) - [pdf](http://arxiv.org/pdf/2002.01368v4) > Considering the nature of unlabelled data, it is common for partially labelled training datasets to contain samples that belong to novel categories. Although these so-called observed novel categories exist in the training data, they do not belong to any of the training labels. In contrast, open-sets define novel categories as those unobserved during during training, but present during testing. This research is the first to generalize between observed and unobserved novel categories within a new learning policy called open-set learning with augmented category by exploiting unlabeled data or open-LACU. This study conducts a high-level review on novelty detection so to differentiate between research fields that concern observed novel categories, and the research fields that concern unobserved novel categories. Open-LACU is then introduced as a synthesis of the relevant fields to maintain the advantages of each within a single learning policy. Currently, we are finalising the first open-LACU network which will be combined with this pre-print to be sent for publication.
2022-07-14 05:27:16 - Golden Reference-Free Hardware Trojan Localization using Graph Convolutional Network - *Rozhin Yasaei, Sina Faezi, Mohammad Abdullah Al Faruque* - `2207.06664v1` - [abs](http://arxiv.org/abs/2207.06664v1) - [pdf](http://arxiv.org/pdf/2207.06664v1) > The globalization of the Integrated Circuit (IC) supply chain has moved most of the design, fabrication, and testing process from a single trusted entity to various untrusted third-party entities worldwide. The risk of using untrusted third-Party Intellectual Property (3PIP) is the possibility for adversaries to insert malicious modifications known as Hardware Trojans (HTs). These HTs can compromise the integrity, deteriorate the performance, deny the service, and alter the functionality of the design. While numerous HT detection methods have been proposed in the literature, the crucial task of HT localization is overlooked. Moreover, a few existing HT localization methods have several weaknesses: reliance on a golden reference, inability to generalize for all types of HT, lack of scalability, low localization resolution, and manual feature engineering/property definition. To overcome their shortcomings, we propose a novel, golden reference-free HT localization method at the pre-silicon stage by leveraging Graph Convolutional Network (GCN). In this work, we convert the circuit design to its intrinsic data structure, graph and extract the node attributes. Afterward, the graph convolution performs automatic feature extraction for nodes to classify the nodes as Trojan or benign. Our automated approach does not burden the designer with manual code review. It locates the Trojan signals with 99.6% accuracy, 93.1% F1-score, and a false-positive rate below 0.009%.
2022-07-14 09:40:34 - An Empirical Evaluation of Four Off-the-Shelf Proprietary Visual-Inertial Odometry Systems - *Jungha Kim, Minkyeong Song, Yeoeun Lee, Moonkyeong Jung, Pyojin Kim* - `2207.06780v1` - [abs](http://arxiv.org/abs/2207.06780v1) - [pdf](http://arxiv.org/pdf/2207.06780v1) > Commercial visual-inertial odometry (VIO) systems have been gaining attention as cost-effective, off-the-shelf six degrees of freedom (6-DoF) ego-motion tracking methods for estimating accurate and consistent camera pose data, in addition to their ability to operate without external localization from motion capture or global positioning systems. It is unclear from existing results, however, which commercial VIO platforms are the most stable, consistent, and accurate in terms of state estimation for indoor and outdoor robotic applications. We assess four popular proprietary VIO systems (Apple ARKit, Google ARCore, Intel RealSense T265, and Stereolabs ZED 2) through a series of both indoor and outdoor experiments where we show their positioning stability, consistency, and accuracy. We present our complete results as a benchmark comparison for the research community.
2022-07-14 09:42:05 - Big Data Testing Techniques: Taxonomy, Challenges and Future Trends - *Iram Arshad, Saeed Hamood Alsamhi, Wasif Afzal* - `2111.02853v4` - [abs](http://arxiv.org/abs/2111.02853v4) - [pdf](http://arxiv.org/pdf/2111.02853v4) > Big Data is reforming many industrial domains by providing decision support through analyzing large data volumes. Big Data testing aims to ensure that Big Data systems run smoothly and error-free while maintaining the performance and quality of data. However, because of the diversity and complexity of data, testing Big Data is challenging. Though numerous research efforts deal with Big Data testing, a comprehensive review to address testing techniques and challenges of Big Data is not available as yet. Therefore, we have systematically reviewed the Big Data testing techniques evidence occurring in the period 2010-2021. This paper discusses testing data processing by highlighting the techniques used in every processing phase. Furthermore, we discuss the challenges and future directions. Our findings show that diverse functional, non-functional and combined (functional and non-functional) testing techniques have been used to solve specific problems related to Big Data. At the same time, most of the testing challenges have been faced during the MapReduce validation phase. In addition, the combinatorial testing technique is one of the most applied techniques in combination with other techniques (i.e., random testing, mutation testing, input space partitioning and equivalence testing) to find various functional faults through Big Data testing.
2022-07-14 10:16:56 - Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network - *Weijian Chen, Yixin Cao, Fuli Feng, Xiangnan He, Yongdong Zhang* - `2207.07503v1` - [abs](http://arxiv.org/abs/2207.07503v1) - [pdf](http://arxiv.org/pdf/2207.07503v1) > Knowledge Graphs (KGs) are becoming increasingly essential infrastructures in many applications while suffering from incompleteness issues. The KG completion task (KGC) automatically predicts missing facts based on an incomplete KG. However, existing methods perform unsatisfactorily in real-world scenarios. On the one hand, their performance will dramatically degrade along with the increasing sparsity of KGs. On the other hand, the inference procedure for prediction is an untrustworthy black box. This paper proposes a novel explainable model for sparse KGC, compositing high-order reasoning into a graph convolutional network, namely HoGRN. It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability while maintaining the model's effectiveness and efficiency. There are two main components that are seamlessly integrated for joint optimization. First, the high-order reasoning component learns high-quality relation representations by capturing endogenous correlation among relations. This can reflect logical rules to justify a broader of missing facts. Second, the entity updating component leverages a weight-free Graph Convolutional Network (GCN) to efficiently model KG structures with interpretability. Unlike conventional methods, we conduct entity aggregation and design composition-based attention in the relational space without additional parameters. The lightweight design makes HoGRN better suitable for sparse settings. For evaluation, we have conducted extensive experiments-the results of HoGRN on several sparse KGs present impressive improvements (9% MRR gain on average). Further ablation and case studies demonstrate the effectiveness of the main components. Our codes will be released upon acceptance.
2022-07-14 11:21:19 - Tutorial on the development of AI models for medical image analysis - *Thijs Kooi* - `2208.00766v1` - [abs](http://arxiv.org/abs/2208.00766v1) - [pdf](http://arxiv.org/pdf/2208.00766v1) > The idea of using computers to read medical scans was introduced as early as 1966. However, limits to machine learning technology meant progress was slow initially. The Alexnet breakthrough in 2012 sparked new interest in the topic, which resulted in the release of 100s of medical AI solutions on the market. In spite of success for some diseases and modalities, many challenges remain. Research typically focuses on the development of specific applications or techniques, clinical evaluation, or meta analysis of clinical studies or techniques through surveys or challenges. However, limited attention has been given to the development process of improving real world performance. In this tutorial, we address the latter and discuss some techniques to conduct the development process in order to make this as efficient as possible.
2022-07-14 11:47:17 - RGB-D Salient Object Detection: A Survey - *Tao Zhou, Deng-Ping Fan, Ming-Ming Cheng, Jianbing Shen, Ling Shao* - `2008.00230v4` - [abs](http://arxiv.org/abs/2008.00230v4) - [pdf](http://arxiv.org/pdf/2008.00230v4) > Salient object detection (SOD), which simulates the human visual perception system to locate the most attractive object(s) in a scene, has been widely applied to various computer vision tasks. Now, with the advent of depth sensors, depth maps with affluent spatial information that can be beneficial in boosting the performance of SOD, can easily be captured. Although various RGB-D based SOD models with promising performance have been proposed over the past several years, an in-depth understanding of these models and challenges in this topic remains lacking. In this paper, we provide a comprehensive survey of RGB-D based SOD models from various perspectives, and review related benchmark datasets in detail. Further, considering that the light field can also provide depth maps, we review SOD models and popular benchmark datasets from this domain as well. Moreover, to investigate the SOD ability of existing models, we carry out a comprehensive evaluation, as well as attribute-based evaluation of several representative RGB-D based SOD models. Finally, we discuss several challenges and open directions of RGB-D based SOD for future research. All collected models, benchmark datasets, source code links, datasets constructed for attribute-based evaluation, and codes for evaluation will be made publicly available at https://github.com/taozh2017/RGBDSODsurvey
2022-07-14 16:44:59 - Leakage and the Reproducibility Crisis in ML-based Science - *Sayash Kapoor, Arvind Narayanan* - `2207.07048v1` - [abs](http://arxiv.org/abs/2207.07048v1) - [pdf](http://arxiv.org/pdf/2207.07048v1) > The use of machine learning (ML) methods for prediction and forecasting has become widespread across the quantitative sciences. However, there are many known methodological pitfalls, including data leakage, in ML-based science. In this paper, we systematically investigate reproducibility issues in ML-based science. We show that data leakage is indeed a widespread problem and has led to severe reproducibility failures. Specifically, through a survey of literature in research communities that adopted ML methods, we find 17 fields where errors have been found, collectively affecting 329 papers and in some cases leading to wildly overoptimistic conclusions. Based on our survey, we present a fine-grained taxonomy of 8 types of leakage that range from textbook errors to open research problems. We argue for fundamental methodological changes to ML-based science so that cases of leakage can be caught before publication. To that end, we propose model info sheets for reporting scientific claims based on ML models that would address all types of leakage identified in our survey. To investigate the impact of reproducibility errors and the efficacy of model info sheets, we undertake a reproducibility study in a field where complex ML models are believed to vastly outperform older statistical models such as Logistic Regression (LR): civil war prediction. We find that all papers claiming the superior performance of complex ML models compared to LR models fail to reproduce due to data leakage, and complex ML models don't perform substantively better than decades-old LR models. While none of these errors could have been caught by reading the papers, model info sheets would enable the detection of leakage in each case.
2022-07-14 18:56:54 - Session-based Cyberbullying Detection in Social Media: A Survey - *Peiling Yi, Arkaitz Zubiaga* - `2207.10639v1` - [abs](http://arxiv.org/abs/2207.10639v1) - [pdf](http://arxiv.org/pdf/2207.10639v1) > Cyberbullying is a pervasive problem in online social media, where a bully abuses a victim through a social media session. By investigating cyberbullying perpetrated through social media sessions, recent research has looked into mining patterns and features for modeling and understanding the two defining characteristics of cyberbullying: repetitive behavior and power imbalance. In this survey paper, we define the Session-based Cyberbullying Detection framework that encapsulates the different steps and challenges of the problem. Based on this framework, we provide a comprehensive overview of session-based cyberbullying detection in social media, delving into existing efforts from a data and methodological perspective. Our review leads us to propose evidence-based criteria for a set of best practices to create session-based cyberbullying datasets. In addition, we perform benchmark experiments comparing the performance of state-of-the-art session-based cyberbullying detection models as well as large pre-trained language models across two different datasets. Through our review, we also put forth a set of open challenges as future research directions.
2022-07-15 04:16:21 - Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning - *Zhenhailong Wang, Hang Yu, Manling Li, Han Zhao, Heng Ji* - `2203.04904v3` - [abs](http://arxiv.org/abs/2203.04904v3) - [pdf](http://arxiv.org/pdf/2203.04904v3) > Despite achieving state-of-the-art zero-shot performance, existing vision-language models still fall short of few-shot transfer ability on domain-specific problems. Classical fine-tuning often fails to prevent highly expressive models from exploiting spurious correlations. Although model-agnostic meta-learning (MAML) presents as a natural alternative for few-shot transfer learning, the expensive computation due to implicit second-order optimization limits its use on large-scale vision-language models such as CLIP. While much literature has been devoted to exploring alternative optimization strategies, we identify another essential aspect towards effective few-shot transfer learning, task sampling, which is previously only be viewed as part of data pre-processing in MAML. To show the impact of task sampling, we propose a simple algorithm, Model-Agnostic Multitask Fine-tuning (MAMF), which differentiates classical fine-tuning only on uniformly sampling multiple tasks. Despite its simplicity, we show that MAMF consistently outperforms classical fine-tuning on five few-shot vision-language classification tasks. We further show that the effectiveness of the bi-level optimization in MAML is highly sensitive to the zero-shot performance of a task in the context of few-shot vision-language classification. The goal of this paper is to provide new insights on what makes few-shot learning work, and encourage more research into investigating better task sampling strategies.
2022-07-15 07:02:55 - Sequence-aware multimodal page classification of Brazilian legal documents - *Pedro H. Luz de Araujo, Ana Paula G. S. de Almeida, Fabricio A. Braz, Nilton C. da Silva, Flavio de Barros Vidal, Teofilo E. de Campos* - `2207.00748v2` - [abs](http://arxiv.org/abs/2207.00748v2) - [pdf](http://arxiv.org/pdf/2207.00748v2) > The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases -- which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.
2022-07-15 07:33:48 - Are We Ready for Robust and Resilient SLAM? A Framework For Quantitative Characterization of SLAM Datasets - *Islam Ali, Hong Zhang* - `2202.11312v2` - [abs](http://arxiv.org/abs/2202.11312v2) - [pdf](http://arxiv.org/pdf/2202.11312v2) > Reliability of SLAM systems is considered one of the critical requirements in modern autonomous systems. This directed the efforts to developing many state-of-the-art systems, creating challenging datasets, and introducing rigorous metrics to measure SLAM performance. However, the link between datasets and performance in the robustness/resilience context has rarely been explored. In order to fill this void, characterization of the operating conditions of SLAM systems is essential in order to provide an environment for quantitative measurement of robustness and resilience. In this paper, we argue that for proper evaluation of SLAM performance, the characterization of SLAM datasets serves as a critical first step. The study starts by reviewing previous efforts for quantitative characterization of SLAM datasets. Then, the problem of perturbation characterization is discussed and the linkage to SLAM robustness/resilience is established. After that, we propose a novel, generic and extendable framework for quantitative analysis and comparison of SLAM datasets. Additionally, a description of different characterization parameters is provided. Finally, we demonstrate the application of our framework by presenting the characterization results of three SLAM datasets: KITTI, EuroC-MAV, and TUM-VI highlighting the level of insights achieved by the proposed framework.
2022-07-15 14:09:04 - A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation - *Aleksandr Petrov, Craig Macdonald* - `2207.07483v1` - [abs](http://arxiv.org/abs/2207.07483v1) - [pdf](http://arxiv.org/pdf/2207.07483v1) > BERT4Rec is an effective model for sequential recommendation based on the Transformer architecture. In the original publication, BERT4Rec claimed superiority over other available sequential recommendation approaches (e.g. SASRec), and it is now frequently being used as a state-of-the art baseline for sequential recommendations. However, not all subsequent publications confirmed this result and proposed other models that were shown to outperform BERT4Rec in effectiveness. In this paper we systematically review all publications that compare BERT4Rec with another popular Transformer-based model, namely SASRec, and show that BERT4Rec results are not consistent within these publications. To understand the reasons behind this inconsistency, we analyse the available implementations of BERT4Rec and show that we fail to reproduce results of the original BERT4Rec publication when using their default configuration parameters. However, we are able to replicate the reported results with the original code if training for a much longer amount of time (up to 30x) compared to the default configuration. We also propose our own implementation of BERT4Rec based on the Hugging Face Transformers library, which we demonstrate replicates the originally reported results on 3 out 4 datasets, while requiring up to 95% less training time to converge. Overall, from our systematic review and detailed experiments, we conclude that BERT4Rec does indeed exhibit state-of-the-art effectiveness for sequential recommendation, but only when trained for a sufficient amount of time. Additionally, we show that our implementation can further benefit from adapting other Transformer architectures that are available in the Hugging Face Transformers library (e.g. using disentangled attention, as provided by DeBERTa, or larger hidden layer size cf. ALBERT).
2022-07-15 16:15:46 - Reasoning about Actions over Visual and Linguistic Modalities: A Survey - *Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang, Chitta Baral* - `2207.07568v1` - [abs](http://arxiv.org/abs/2207.07568v1) - [pdf](http://arxiv.org/pdf/2207.07568v1) > 'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals. As a result, most common sense (CS) knowledge for humans revolves around actions. While 'Reasoning about Actions & Change' (RAC) has been widely studied in the Knowledge Representation community, it has recently piqued the interest of NLP and computer vision researchers. This paper surveys existing tasks, benchmark datasets, various techniques and models, and their respective performance concerning advancements in RAC in the vision and language domain. Towards the end, we summarize our key takeaways, discuss the present challenges facing this research area, and outline potential directions for future research.
2022-07-15 16:50:11 - Mobile Keystroke Biometrics Using Transformers - *Giuseppe Stragapede, Paula Delgado-Santos, Ruben Tolosana, Ruben Vera-Rodriguez, Richard Guest, Aythami Morales* - `2207.07596v1` - [abs](http://arxiv.org/abs/2207.07596v1) - [pdf](http://arxiv.org/pdf/2207.07596v1) > Behavioural biometrics have proven to be effective against identity theft as well as be considered user-friendly authentication methods. One of the most popular traits in the literature is keystroke dynamics due to the large deployment of computers and mobile devices in our society. This paper focuses on improving keystroke biometric systems on the free-text scenario. This scenario is characterised as very challenging due to the uncontrolled text conditions, the influential of the user's emotional and physical state, and the in-use application. To overcome these drawbacks, methods based on deep learning such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been proposed in the literature, outperforming traditional machine learning methods. However, these architectures still have aspects that need to be reviewed and improved. To the best of our knowledge, this is the first study that proposes keystroke biometric systems based on Transformers. The proposed Transformer architecture has achieved Equal Error Rate (EER) values of 3.84% in the popular Aalto mobile keystroke database using only 5 enrolment sessions, outperforming in large margin other state-of-the-art approaches in the literature.
2022-07-15 19:04:43 - Probing Semantic Grounding in Language Models of Code with Representational Similarity Analysis - *Shounak Naik, Rajaswa Patil, Swati Agarwal, Veeky Baths* - `2207.07706v1` - [abs](http://arxiv.org/abs/2207.07706v1) - [pdf](http://arxiv.org/pdf/2207.07706v1) > Representational Similarity Analysis is a method from cognitive neuroscience, which helps in comparing representations from two different sources of data. In this paper, we propose using Representational Similarity Analysis to probe the semantic grounding in language models of code. We probe representations from the CodeBERT model for semantic grounding by using the data from the IBM CodeNet dataset. Through our experiments, we show that current pre-training methods do not induce semantic grounding in language models of code, and instead focus on optimizing form-based patterns. We also show that even a little amount of fine-tuning on semantically relevant tasks increases the semantic grounding in CodeBERT significantly. Our ablations with the input modality to the CodeBERT model show that using bimodal inputs (code and natural language) over unimodal inputs (only code) gives better semantic grounding and sample efficiency during semantic fine-tuning. Finally, our experiments with semantic perturbations in code reveal that CodeBERT is able to robustly distinguish between semantically correct and incorrect code.
2022-07-15 19:53:20 - How to Reuse and Compose Knowledge for a Lifetime of Tasks: A Survey on Continual Learning and Functional Composition - *Jorge A. Mendez, Eric Eaton* - `2207.07730v1` - [abs](http://arxiv.org/abs/2207.07730v1) - [pdf](http://arxiv.org/pdf/2207.07730v1) > A major goal of artificial intelligence (AI) is to create an agent capable of acquiring a general understanding of the world. Such an agent would require the ability to continually accumulate and build upon its knowledge as it encounters new experiences. Lifelong or continual learning addresses this setting, whereby an agent faces a continual stream of problems and must strive to capture the knowledge necessary for solving each new task it encounters. If the agent is capable of accumulating knowledge in some form of compositional representation, it could then selectively reuse and combine relevant pieces of knowledge to construct novel solutions. Despite the intuitive appeal of this simple idea, the literatures on lifelong learning and compositional learning have proceeded largely separately. In an effort to promote developments that bridge between the two fields, this article surveys their respective research landscapes and discusses existing and future connections between them.
2022-07-15 20:33:29 - Human keypoint detection for close proximity human-robot interaction - *Jan Docekal, Jakub Rozlivek, Jiri Matas, Matej Hoffmann* - `2207.07742v1` - [abs](http://arxiv.org/abs/2207.07742v1) - [pdf](http://arxiv.org/pdf/2207.07742v1) > We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make publicly available a new Human in Close Proximity (HiCP) dataset; (ii) we quantitatively and qualitatively compare state-of-the-art human whole-body 2D keypoint detection methods (OpenPose, MMPose, AlphaPose, Detectron2) on this dataset; (iii) since accurate detection of hands and fingers is critical in applications with handovers, we evaluate the performance of the MediaPipe hand detector; (iv) we deploy the algorithms on a humanoid robot with an RGB-D camera on its head and evaluate the performance in 3D human keypoint detection. A motion capture system is used as reference. The best performing whole-body keypoint detectors in close proximity were MMPose and AlphaPose, but both had difficulty with finger detection. Thus, we propose a combination of MMPose or AlphaPose for the body and MediaPipe for the hands in a single framework providing the most accurate and robust detection. We also analyse the failure modes of individual detectors -- for example, to what extent the absence of the head of the person in the image degrades performance. Finally, we demonstrate the framework in a scenario where a humanoid robot interacting with a person uses the detected 3D keypoints for whole-body avoidance maneuvers.
2022-07-16 01:27:59 - A Survey of Vision-Language Pre-Trained Models - *Yifan Du, Zikang Liu, Junyi Li, Wayne Xin Zhao* - `2202.10936v2` - [abs](http://arxiv.org/abs/2202.10936v2) - [pdf](http://arxiv.org/pdf/2202.10936v2) > As transformer evolves, pre-trained models have advanced at a breakneck pace in recent years. They have dominated the mainstream techniques in natural language processing (NLP) and computer vision (CV). How to adapt pre-training to the field of Vision-and-Language (V-L) learning and improve downstream task performance becomes a focus of multimodal learning. In this paper, we review the recent progress in Vision-Language Pre-Trained Models (VL-PTMs). As the core content, we first briefly introduce several ways to encode raw images and texts to single-modal embeddings before pre-training. Then, we dive into the mainstream architectures of VL-PTMs in modeling the interaction between text and image representations. We further present widely-used pre-training tasks, and then we introduce some common downstream tasks. We finally conclude this paper and present some promising research directions. Our survey aims to provide researchers with synthesis and pointer to related research.
2022-07-16 16:04:01 - A Survey of Decision Making in Adversarial Games - *Xiuxian Li, Min Meng, Yiguang Hong, Jie Chen* - `2207.07971v1` - [abs](http://arxiv.org/abs/2207.07971v1) - [pdf](http://arxiv.org/pdf/2207.07971v1) > Game theory has by now found numerous applications in various fields, including economics, industry, jurisprudence, and artificial intelligence, where each player only cares about its own interest in a noncooperative or cooperative manner, but without obvious malice to other players. However, in many practical applications, such as poker, chess, evader pursuing, drug interdiction, coast guard, cyber-security, and national defense, players often have apparently adversarial stances, that is, selfish actions of each player inevitably or intentionally inflict loss or wreak havoc on other players. Along this line, this paper provides a systematic survey on three main game models widely employed in adversarial games, i.e., zero-sum normal-form and extensive-form games, Stackelberg (security) games, zero-sum differential games, from an array of perspectives, including basic knowledge of game models, (approximate) equilibrium concepts, problem classifications, research frontiers, (approximate) optimal strategy seeking techniques, prevailing algorithms, and practical applications. Finally, promising future research directions are also discussed for relevant adversarial games.
2022-07-16 20:37:46 - Meta-Referential Games to Learn Compositional Learning Behaviours - *Kevin Denamganaï, Sondess Missaoui, James Alfred Walker* - `2207.08012v1` - [abs](http://arxiv.org/abs/2207.08012v1) - [pdf](http://arxiv.org/pdf/2207.08012v1) > Human beings use compositionality to generalise from past experiences to actual or fictive, novel experiences. To do so, we separate our experiences into fundamental atomic components. These atomic components can then be recombined in novel ways to support our ability to imagine and engage with novel experiences. We frame this as the ability to learn to generalise compositionally. And, we will refer to behaviours making use of this ability as compositional learning behaviours (CLBs). A central problem to learning CLBs is the resolution of a binding problem (BP) (by learning to, firstly, segregate the supportive stimulus components from the observation of multiple stimuli, and then, combine them in a single episodic experience). While it is another feat of intelligence that human beings perform with ease, it is not the case for state-of-the-art artificial agents. Thus, in order to build artificial agents able to collaborate with human beings, we propose to develop a novel benchmark to investigate agents' abilities to exhibit CLBs by solving a domain-agnostic version of the BP. We take inspiration from the language emergence and grounding framework of referential games and propose a meta-learning extension of referential games, entitled Meta-Referential Games, and use this framework to build our benchmark, that we name Symbolic Behaviour Benchmark (S2B). While it has the potential to test for more symbolic behaviours, rather than solely CLBs, in the present paper, though, we solely focus on the single-agent language grounding task that tests for CLBs. We provide baseline results for it, using state-of-the-art RL agents, and show that our proposed benchmark is a compelling challenge that we hope will spur the research community towards developing more capable artificial agents.
2022-07-17 11:24:44 - Can large language models reason about medical questions? - *Valentin Liévin, Christoffer Egeberg Hother, Ole Winther* - `2207.08143v1` - [abs](http://arxiv.org/abs/2207.08143v1) - [pdf](http://arxiv.org/pdf/2207.08143v1) > Although large language models (LLMs) often produce impressive outputs, they also fail to reason and be factual. We set out to investigate how these limitations affect the LLM's ability to answer and reason about difficult real-world based questions. We applied the human-aligned GPT-3 (InstructGPT) to answer multiple-choice medical exam questions (USMLE and MedMCQA) and medical research questions (PubMedQA). We investigated Chain-of-thought (think step by step) prompts, grounding (augmenting the prompt with search results) and few-shot (prepending the question with question-answer exemplars). For a subset of the USMLE questions, a medical domain expert reviewed and annotated the model's reasoning. Overall, GPT-3 achieved a substantial improvement in state-of-the-art machine learning performance. We observed that GPT-3 is often knowledgeable and can reason about medical questions. GPT-3, when confronted with a question it cannot answer, will still attempt to answer, often resulting in a biased predictive distribution. LLMs are not on par with human performance but our results suggest the emergence of reasoning patterns that are compatible with medical problem-solving. We speculate that scaling model and data, enhancing prompt alignment and allowing for better contextualization of the completions will be sufficient for LLMs to reach human-level performance on this type of task.
2022-07-17 11:52:52 - Improving Deep Neural Network Random Initialization Through Neuronal Rewiring - *Leonardo Scabini, Bernard De Baets, Odemir M. Bruno* - `2207.08148v1` - [abs](http://arxiv.org/abs/2207.08148v1) - [pdf](http://arxiv.org/pdf/2207.08148v1) > The deep learning literature is continuously updated with new architectures and training techniques. However, weight initialization is overlooked by most recent research, despite some intriguing findings regarding random weights. On the other hand, recent works have been approaching Network Science to understand the structure and dynamics of Artificial Neural Networks (ANNs) after training. Therefore, in this work, we analyze the centrality of neurons in randomly initialized networks. We show that a higher neuronal strength variance may decrease performance, while a lower neuronal strength variance usually improves it. A new method is then proposed to rewire neuronal connections according to a preferential attachment (PA) rule based on their strength, which significantly reduces the strength variance of layers initialized by common methods. In this sense, PA rewiring only reorganizes connections, while preserving the magnitude and distribution of the weights. We show through an extensive statistical analysis in image classification that performance is improved in most cases, both during training and testing, when using both simple and complex architectures and learning schedules. Our results show that, aside from the magnitude, the organization of the weights is also relevant for better initialization of deep ANNs.
2022-07-17 12:54:10 - Task-aware Similarity Learning for Event-triggered Time Series - *Shaoyu Dou, Kai Yang, Yang Jiao, Chengbo Qiu, Kui Ren* - `2207.08159v1` - [abs](http://arxiv.org/abs/2207.08159v1) - [pdf](http://arxiv.org/pdf/2207.08159v1) > Time series analysis has achieved great success in diverse applications such as network security, environmental monitoring, and medical informatics. Learning similarities among different time series is a crucial problem since it serves as the foundation for downstream analysis such as clustering and anomaly detection. It often remains unclear what kind of distance metric is suitable for similarity learning due to the complex temporal dynamics of the time series generated from event-triggered sensing, which is common in diverse applications, including automated driving, interactive healthcare, and smart home automation. The overarching goal of this paper is to develop an unsupervised learning framework that is capable of learning task-aware similarities among unlabeled event-triggered time series. From the machine learning vantage point, the proposed framework harnesses the power of both hierarchical multi-scale sequence autoencoders and Gaussian Mixture Model (GMM) to effectively learn the low-dimensional representations from the time series. Finally, the obtained similarity measure can be easily visualized for explaining. The proposed framework aspires to offer a stepping stone that gives rise to a systematic approach to model and learn similarities among a multitude of event-triggered time series. Through extensive qualitative and quantitative experiments, it is revealed that the proposed method outperforms state-of-the-art methods considerably.
2022-07-17 12:59:34 - Natural language processing for clusterization of genes according to their functions - *Vladislav Dordiuk, Ekaterina Demicheva, Fernando Polanco Espino, Konstantin Ushenin* - `2207.08162v1` - [abs](http://arxiv.org/abs/2207.08162v1) - [pdf](http://arxiv.org/pdf/2207.08162v1) > There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of several clusters. The list of genes is enriched with information from open databases. Then, the descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches. The encoded gene function pass through the dimensionality reduction and clusterization. Aiming to find the most efficient pipeline, 180 cases of pipeline with different methods in the major pipeline steps were analyzed. The performance was evaluated with clusterization indexes and expert review of the results.
2022-07-17 21:02:04 - An Overview of Distant Supervision for Relation Extraction with a Focus on Denoising and Pre-training Methods - *William Hogan* - `2207.08286v1` - [abs](http://arxiv.org/abs/2207.08286v1) - [pdf](http://arxiv.org/pdf/2207.08286v1) > Relation Extraction (RE) is a foundational task of natural language processing. RE seeks to transform raw, unstructured text into structured knowledge by identifying relational information between entity pairs found in text. RE has numerous uses, such as knowledge graph completion, text summarization, question-answering, and search querying. The history of RE methods can be roughly organized into four phases: pattern-based RE, statistical-based RE, neural-based RE, and large language model-based RE. This survey begins with an overview of a few exemplary works in the earlier phases of RE, highlighting limitations and shortcomings to contextualize progress. Next, we review popular benchmarks and critically examine metrics used to assess RE performance. We then discuss distant supervision, a paradigm that has shaped the development of modern RE methods. Lastly, we review recent RE works focusing on denoising and pre-training methods.
2022-07-18 04:15:23 - Deep Weakly-Supervised Learning Methods for Classification and Localization in Histology Images: A Survey - *Jérôme Rony, Soufiane Belharbi, Jose Dolz, Ismail Ben Ayed, Luke McCaffrey, Eric Granger* - `1909.03354v6` - [abs](http://arxiv.org/abs/1909.03354v6) - [pdf](http://arxiv.org/pdf/1909.03354v6) > Using deep learning models to diagnose cancer from histology data presents several challenges. Cancer grading and localization of regions of interest (ROIs) in these images normally relies on both image- and pixel-level labels, the latter requiring a costly annotation process. Deep weakly-supervised object localization (WSOL) methods provide different strategies for low-cost training of deep learning models. Using only image-class annotations, these methods can be trained to classify an image, and yield class activation maps (CAMs) for ROI localization. This paper provides a review of state-of-art DL methods for WSOL. We propose a taxonomy where these methods are divided into bottom-up and top-down methods according to the information flow in models. Although the latter have seen limited progress, recent bottom-up methods are currently driving much progress with deep WSOL methods. Early works focused on designing different spatial pooling functions. However, these methods reached limited localization accuracy, and unveiled a major limitation -- the under-activation of CAMs which leads to high false negative localization. Subsequent works aimed to alleviate this issue and recover complete object. Representative methods from our taxonomy are evaluated and compared in terms of classification and localization accuracy on two challenging histology datasets. Overall, the results indicate poor localization performance, particularly for generic methods that were initially designed to process natural images. Methods designed to address the challenges of histology data yielded good results. However, all methods suffer from high false positive/negative localization. Four key challenges are identified for the application of deep WSOL methods in histology -- under/over activation of CAMs, sensitivity to thresholding, and model selection.
2022-07-18 11:37:47 - Classifying COVID-19 vaccine narratives - *Yue Li, Carolina Scarton, Xingyi Song, Kalina Bontcheva* - `2207.08522v1` - [abs](http://arxiv.org/abs/2207.08522v1) - [pdf](http://arxiv.org/pdf/2207.08522v1) > COVID-19 vaccine hesitancy is widespread, despite governments' information campaigns and WHO efforts. One of the reasons behind this is vaccine disinformation which widely spreads in social media. In particular, recent surveys have established that vaccine disinformation is impacting negatively citizen trust in COVID-19 vaccination. At the same time, fact-checkers are struggling with detecting and tracking of vaccine disinformation, due to the large scale of social media. To assist fact-checkers in monitoring vaccine narratives online, this paper studies a new vaccine narrative classification task, which categorises COVID-19 vaccine claims into one of seven categories. Following a data augmentation approach, we first construct a novel dataset for this new classification task, focusing on the minority classes. We also make use of fact-checker annotated data. The paper also presents a neural vaccine narrative classifier that achieves an accuracy of 84% under cross-validation. The classifier is publicly available for researchers and journalists.
2022-07-18 13:47:20 - Abstraction between Structural Causal Models: A Review of Definitions and Properties - *Fabio Massimo Zennaro* - `2207.08603v1` - [abs](http://arxiv.org/abs/2207.08603v1) - [pdf](http://arxiv.org/pdf/2207.08603v1) > Structural causal models (SCMs) are a widespread formalism to deal with causal systems. A recent direction of research has considered the problem of relating formally SCMs at different levels of abstraction, by defining maps between SCMs and imposing a requirement of interventional consistency. This paper offers a review of the solutions proposed so far, focusing on the formal properties of a map between SCMs, and highlighting the different layers (structural, distributional) at which these properties may be enforced. This allows us to distinguish families of abstractions that may or may not be permitted by choosing to guarantee certain properties instead of others. Such an understanding not only allows to distinguish among proposal for causal abstraction with more awareness, but it also allows to tailor the definition of abstraction with respect to the forms of abstraction relevant to specific applications.
2022-07-19 04:41:59 - A Prospective Approach for Human-to-Human Interaction Recognition from Wi-Fi Channel Data using Attention Bidirectional Gated Recurrent Neural Network with GUI Application Implementation - *Md. Mohi Uddin Khan, Abdullah Bin Shams, Md. Mohsin Sarker Raihan* - `2202.08146v3` - [abs](http://arxiv.org/abs/2202.08146v3) - [pdf](http://arxiv.org/pdf/2202.08146v3) > Recent advances in 5G wireless technology and socioeconomic transformation have brought a paradigm shift in sensor applications. Wi-Fi signal demonstrates a strong correlation between its temporal variation and body movements, which can be leveraged to recognize human activity. In this article, we demonstrate the cognitive ability of device free mutual human-to-human interaction recognition method based on the time scale Wi-Fi channel state information. The mutual activities examined are steady-state, approaching, departing, handshaking, high-five, hugging, kicking (left-leg), kicking (right-leg), pointing (left-hand), pointing (right-hand), punching(left-hand), punching (right-hand), and pushing. We explore and propose a Self-Attention furnished Bidirectional Gated Recurrent Neural Network model to classify 13 human-to-human mutual interaction types from the time-series data. Our proposed model can recognize a two subject pair mutual interaction with a maximum benchmark accuracy of 94%. This has been expanded for ten subject pairs, which secured a benchmark accuracy of 88% with improved classification around the interaction-transition region. Also, an executable graphical user interface (GUI) is developed, using the PyQt5 python module, to subsequently display the overall mutual human-interaction recognition procedure in real-time. Finally, we conclude with a brief discourse regarding the possible solutions to the handicaps that resulted in curtailments observed during the study. Such, Wi-Fi channel perturbation pattern analysis is believed to be an efficient, economical and privacy-friendly approach to be potentially utilized in mutual human-interaction recognition for indoor activity monitoring, surveillance system, smart health monitoring systems and independent assisted living.
2022-07-19 05:31:16 - Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning - *Wenhao Ding, Haohong Lin, Bo Li, Ding Zhao* - `2207.09081v1` - [abs](http://arxiv.org/abs/2207.09081v1) - [pdf](http://arxiv.org/pdf/2207.09081v1) > As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents' generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies. Due to the lack of public benchmarks that verify generalization capability under reasoning, we design nine tasks and then empirically show the effectiveness of the proposed method against five baselines on these tasks. Further theoretical analysis shows that our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training, which aligns with the experimental evidence in extensive ablation studies.
2022-07-19 10:11:22 - Urdu Speech and Text Based Sentiment Analyzer - *Waqar Ahmad, Maryam Edalati* - `2207.09163v1` - [abs](http://arxiv.org/abs/2207.09163v1) - [pdf](http://arxiv.org/pdf/2207.09163v1) > Discovering what other people think has always been a key aspect of our information-gathering strategy. People can now actively utilize information technology to seek out and comprehend the ideas of others, thanks to the increased availability and popularity of opinion-rich resources such as online review sites and personal blogs. Because of its crucial function in understanding people's opinions, sentiment analysis (SA) is a crucial task. Existing research, on the other hand, is primarily focused on the English language, with just a small amount of study devoted to low-resource languages. For sentiment analysis, this work presented a new multi-class Urdu dataset based on user evaluations. The tweeter website was used to get Urdu dataset. Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative. The primary purpose of this research is to construct a manually annotated dataset for Urdu sentiment analysis and to establish the baseline result. Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.
2022-07-19 10:19:40 - A Streamline-guided De-Homogenization Approach for Structural Design - *Junpeng Wang, Rüdiger Westermann, Jun Wu* - `2207.09172v1` - [abs](http://arxiv.org/abs/2207.09172v1) - [pdf](http://arxiv.org/pdf/2207.09172v1) > We present a novel de-homogenization approach for efficient design of high-resolution load-bearing structures. The proposed approach builds upon a streamline-based parametrization of the design domain, using a set of space-filling and evenly-spaced streamlines in the two mutually orthogonal direction fields that are obtained from homogenization-based topology optimization. Streamlines in these fields are converted into a graph, which is then used to construct a quad-dominant mesh whose edges follow the direction fields. In addition, the edge width is adjusted according to the density and anisotropy of the optimized orthotropic cells. In a number of numerical examples, we demonstrate the mechanical performance and regular appearance of the resulting structural designs, and compare them with those from classic and contemporary approaches.
2022-07-19 10:20:52 - Explainability of deep vision-based autonomous driving systems: Review and challenges - *Éloi Zablocki, Hédi Ben-Younes, Patrick Pérez, Matthieu Cord* - `2101.05307v2` - [abs](http://arxiv.org/abs/2101.05307v2) - [pdf](http://arxiv.org/pdf/2101.05307v2) > This survey reviews explainability methods for vision-based self-driving systems trained with behavior cloning. The concept of explainability has several facets and the need for explainability is strong in driving, a safety-critical application. Gathering contributions from several research fields, namely computer vision, deep learning, autonomous driving, explainable AI (X-AI), this survey tackles several points. First, it discusses definitions, context, and motivation for gaining more interpretability and explainability from self-driving systems, as well as the challenges that are specific to this application. Second, methods providing explanations to a black-box self-driving system in a post-hoc fashion are comprehensively organized and detailed. Third, approaches from the literature that aim at building more interpretable self-driving systems by design are presented and discussed in detail. Finally, remaining open-challenges and potential future research directions are identified and examined.
2022-07-19 15:59:40 - Computer Vision to the Rescue: Infant Postural Symmetry Estimation from Incongruent Annotations - *Xiaofei Huang, Michael Wan, Lingfei Luan, Bethany Tunik, Sarah Ostadabbas* - `2207.09352v1` - [abs](http://arxiv.org/abs/2207.09352v1) - [pdf](http://arxiv.org/pdf/2207.09352v1) > Bilateral postural symmetry plays a key role as a potential risk marker for autism spectrum disorder (ASD) and as a symptom of congenital muscular torticollis (CMT) in infants, but current methods of assessing symmetry require laborious clinical expert assessments. In this paper, we develop a computer vision based infant symmetry assessment system, leveraging 3D human pose estimation for infants. Evaluation and calibration of our system against ground truth assessments is complicated by our findings from a survey of human ratings of angle and symmetry, that such ratings exhibit low inter-rater reliability. To rectify this, we develop a Bayesian estimator of the ground truth derived from a probabilistic graphical model of fallible human raters. We show that the 3D infant pose estimation model can achieve 68% area under the receiver operating characteristic curve performance in predicting the Bayesian aggregate labels, compared to only 61% from a 2D infant pose estimation model and 60% from a 3D adult pose estimation model, highlighting the importance of 3D poses and infant domain knowledge in assessing infant body symmetry. Our survey analysis also suggests that human ratings are susceptible to higher levels of bias and inconsistency, and hence our final 3D pose-based symmetry assessment system is calibrated but not directly supervised by Bayesian aggregate human ratings, yielding higher levels of consistency and lower levels of inter-limb assessment bias.
2022-07-19 18:10:43 - Deep Analysis of Visual Product Reviews - *Chandranath Adak, Soumi Chattopadhyay, Muhammad Saqib* - `2207.09499v1` - [abs](http://arxiv.org/abs/2207.09499v1) - [pdf](http://arxiv.org/pdf/2207.09499v1) > With the proliferation of the e-commerce industry, analyzing customer feedback is becoming indispensable to a service provider. In recent days, it can be noticed that customers upload the purchased product images with their review scores. In this paper, we undertake the task of analyzing such visual reviews, which is very new of its kind. In the past, the researchers worked on analyzing language feedback, but here we do not take any assistance from linguistic reviews that may be absent, since a recent trend can be observed where customers prefer to quickly upload the visual feedback instead of typing language feedback. We propose a hierarchical architecture, where the higher-level model engages in product categorization, and the lower-level model pays attention to predicting the review score from a customer-provided product image. We generated a database by procuring real visual product reviews, which was quite challenging. Our architecture obtained some promising results by performing extensive experiments on the employed database. The proposed hierarchical architecture attained a 57.48% performance improvement over the single-level best comparable architecture.
2022-07-19 21:04:31 - Enhancing Collaborative Filtering Recommender with Prompt-Based Sentiment Analysis - *Elliot Dang, Zheyuan Hu, Tong Li* - `2207.12883v1` - [abs](http://arxiv.org/abs/2207.12883v1) - [pdf](http://arxiv.org/pdf/2207.12883v1) > Collaborative Filtering(CF) recommender is a crucial application in the online market and ecommerce. However, CF recommender has been proven to suffer from persistent problems related to sparsity of the user rating that will further lead to a cold-start issue. Existing methods address the data sparsity issue by applying token-level sentiment analysis that translate text review into sentiment scores as a complement of the user rating. In this paper, we attempt to optimize the sentiment analysis with advanced NLP models including BERT and RoBERTa, and experiment on whether the CF recommender has been further enhanced. We build the recommenders on the Amazon US Reviews dataset, and tune the pretrained BERT and RoBERTa with the traditional fine-tuned paradigm as well as the new prompt-based learning paradigm. Experimental result shows that the recommender enhanced with the sentiment ratings predicted by the fine-tuned RoBERTa has the best performance, and achieved 30.7% overall gain by comparing MAP, NDCG and precision at K to the baseline recommender. Prompt-based learning paradigm, although superior to traditional fine-tune paradigm in pure sentiment analysis, fail to further improve the CF recommender.
2022-07-19 22:41:36 - A detailed introduction to density-based topology optimisation of fluid flow problems with implementation in MATLAB - *Joe Alexandersen* - `2207.13695v1` - [abs](http://arxiv.org/abs/2207.13695v1) - [pdf](http://arxiv.org/pdf/2207.13695v1) > This article presents a detailed introduction to density-based topology optimisation of fluid flow problems. The goal is to allow new students and researchers to quickly get started in the research area and to skip many of the initial steps, often consuming unnecessarily long time from the scientific advancement of the field. This is achieved by providing a step-by-step guide to the components necessary to understand and implement the theory, as well as extending the supplied MATLAB code. The continuous design representation used and how it is connected to the Brinkman penalty approach, for simulating an immersed solid in a fluid domain, is illustrated. The different interpretations of the Brinkman penalty term and how to chose the penalty parameters are explained. The accuracy of the Brinkman penalty approach is analysed through parametric simulations of a reference geometry. The chosen finite element formulation and the solution method is explained. The minimum dissipated energy optimisation problem is defined and how to solve it using an optimality criteria solver and a continuation scheme is discussed. The included MATLAB implementation is documented, with details on the mesh, pre-processing, optimisation and post-processing. The code has two benchmark examples implemented and the application of the code to these is reviewed. Subsequently, several modifications to the code for more complicated examples are presented through provided code modifications and explanations. Lastly, the computational performance of the code is examined through studies of the computational time and memory usage, along with recommendations to decrease computational time through approximations.
2022-07-20 05:51:00 - Self-supervised learning methods and applications in medical imaging analysis: A survey - *Saeed Shurrab, Rehab Duwairi* - `2109.08685v3` - [abs](http://arxiv.org/abs/2109.08685v3) - [pdf](http://arxiv.org/pdf/2109.08685v3) > The scarcity of high-quality annotated medical imaging datasets is a major problem that collides with machine learning applications in the field of medical imaging analysis and impedes its advancement. Self-supervised learning is a recent training paradigm that enables learning robust representations without the need for human annotation which can be considered an effective solution for the scarcity of annotated medical data. This article reviews the state-of-the-art research directions in self-supervised learning approaches for image data with a concentration on their applications in the field of medical imaging analysis. The article covers a set of the most recent self-supervised learning methods from the computer vision field as they are applicable to the medical imaging analysis and categorize them as predictive, generative, and contrastive approaches. Moreover, the article covers 40 of the most recent research papers in the field of self-supervised learning in medical imaging analysis aiming at shedding the light on the recent innovation in the field. Finally, the article concludes with possible future research directions in the field.
2022-07-20 07:32:29 - Analysis of the hands in egocentric vision: A survey - *Andrea Bandini, José Zariffa* - `1912.10867v3` - [abs](http://arxiv.org/abs/1912.10867v3) - [pdf](http://arxiv.org/pdf/1912.10867v3) > Egocentric vision (a.k.a. first-person vision - FPV) applications have thrived over the past few years, thanks to the availability of affordable wearable cameras and large annotated datasets. The position of the wearable camera (usually mounted on the head) allows recording exactly what the camera wearers have in front of them, in particular hands and manipulated objects. This intrinsic advantage enables the study of the hands from multiple perspectives: localizing hands and their parts within the images; understanding what actions and activities the hands are involved in; and developing human-computer interfaces that rely on hand gestures. In this survey, we review the literature that focuses on the hands using egocentric vision, categorizing the existing approaches into: localization (where are the hands or parts of them?); interpretation (what are the hands doing?); and application (e.g., systems that used egocentric hand cues for solving a specific problem). Moreover, a list of the most prominent datasets with hand-based annotations is provided.
2022-07-20 13:37:57 - Face-to-Face Co-Located Human-Human Social Interaction Analysis using Nonverbal Cues: A Survey - *Cigdem Beyan, Alessandro Vinciarelli, Alessio Del Bue* - `2207.10574v1` - [abs](http://arxiv.org/abs/2207.10574v1) - [pdf](http://arxiv.org/pdf/2207.10574v1) > This work presents a systematic review of recent efforts (since 2010) aimed at automatic analysis of nonverbal cues displayed in face-to-face co-located human-human social interactions. The main reason for focusing on nonverbal cues is that these are the physical, machine detectable traces of social and psychological phenomena. Therefore, detecting and understanding nonverbal cues means, at least to a certain extent, to detect and understand social and psychological phenomena. The covered topics are categorized into three as: a) modeling social traits, such as leadership, dominance, personality traits, b) social role recognition and social relations detection and c) interaction dynamics analysis in terms of group cohesion, empathy, rapport and so forth. We target the co-located interactions, in which the interactants are always humans. The survey covers a wide spectrum of settings and scenarios, including free-standing interactions, meetings, indoor and outdoor social exchanges, dyadic conversations, and crowd dynamics. For each of them, the survey considers the three main elements of nonverbal cues analysis, namely data, sensing approaches and computational methodologies. The goal is to highlight the main advances of the last decade, to point out existing limitations, and to outline future directions.
2022-07-20 14:12:05 - ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network - *Nikolaos Gkalelis, Dimitrios Daskalakis, Vasileios Mezaris* - `2207.09927v1` - [abs](http://arxiv.org/abs/2207.09927v1) - [pdf](http://arxiv.org/pdf/2207.09927v1) > In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video, is proposed. The ViGAT head consists of graph attention network (GAT) blocks factorized along the spatial and temporal dimensions in order to capture effectively both local and long-term dependencies between objects or frames. Moreover, using the weighted in-degrees (WiDs) derived from the adjacency matrices at the various GAT blocks, we show that the proposed architecture can identify the most salient objects and frames that explain the decision of the network. A comprehensive evaluation study is performed, demonstrating that the proposed approach provides state-of-the-art results on three large, publicly available video datasets (FCVID, Mini-Kinetics, ActivityNet).
2022-07-21 03:14:47 - Fast Electromagnetic Validations of Large-Scale Digital Coding Metasurfaces Accelerated by Recurrence Rebuild and Retrieval Method - *Yu Zhao, Shang Xiang, Long Li* - `2112.05082v2` - [abs](http://arxiv.org/abs/2112.05082v2) - [pdf](http://arxiv.org/pdf/2112.05082v2) > The recurrence rebuild and retrieval method (R3M) is proposed in this paper to accelerate the electromagnetic (EM) validations of large-scale digital coding metasurfaces (DCMs). R3M aims to accelerate the EM validations of DCMs with varied codebooks, which involves the analysis of a group of similar but not identical structures. The method transforms general DCMs to rigorously periodic arrays by replacing each coding unit with the macro unit, which comprises all possible coding states. The system matrix corresponding to the rigorously periodic array is globally shared for DCMs with arbitrary codebooks via implicit retrieval. The discrepancy of the interactions for edge and corner units are precluded by the basis extension of periodic boundaries. Moreover, the hierarchical pattern exploitation (HPE) algorithm is leveraged to efficiently assemble the system matrix for further acceleration. Due to the fully utilization of the rigid periodicity, the computational complexity of R3M-HPE is theoretically lower than that of $\mathcal{H}$-matrix within the same paradigm. Numerical results for two types of DCMs indicate that R3M-HPE is accurate in comparison with commercial software. Besides, R3M-HPE is also compatible with the preconditioning for efficient iterative solutions. The efficiency of R3M-HPE for DCMs outperforms the conventional fast algorithms in both the storage and CPU time cost.
2022-07-21 04:26:48 - Deep Learning for Unsupervised Anomaly Localization in Industrial Images: A Survey - *Xian Tao, Xinyi Gong, Xin Zhang, Shaohua Yan, Chandranath Adak* - `2207.10298v1` - [abs](http://arxiv.org/abs/2207.10298v1) - [pdf](http://arxiv.org/pdf/2207.10298v1) > Currently, deep learning-based visual inspection has been highly successful with the help of supervised learning methods. However, in real industrial scenarios, the scarcity of defect samples, the cost of annotation, and the lack of a priori knowledge of defects may render supervised-based methods ineffective. In recent years, unsupervised anomaly localization algorithms have become more widely used in industrial inspection tasks. This paper aims to help researchers in this field by comprehensively surveying recent achievements in unsupervised anomaly localization in industrial images using deep learning. The survey reviews more than 120 significant publications covering different aspects of anomaly localization, mainly covering various concepts, challenges, taxonomies, benchmark datasets, and quantitative performance comparisons of the methods reviewed. In reviewing the achievements to date, this paper provides detailed predictions and analysis of several future research directions. This review provides detailed technical information for researchers interested in industrial anomaly localization and who wish to apply it to the localization of anomalies in other fields.
2022-07-21 05:05:58 - A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration - *Ming Liu, Yuxiang Wei, Xiaohe Wu, Wangmeng Zuo, Lei Zhang* - `2207.10309v1` - [abs](http://arxiv.org/abs/2207.10309v1) - [pdf](http://arxiv.org/pdf/2207.10309v1) > Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality. With the ability to generate photo-realistic high-resolution (e.g., $1024\times1024$) images, recent GAN models have greatly narrowed the gaps between the generated images and the real ones. Therefore, many recent works show emerging interest to take advantage of pre-trained GAN models by exploiting the well-disentangled latent space and the learned GAN priors. In this paper, we briefly review recent progress on leveraging pre-trained large-scale GAN models from three aspects, i.e., 1) the training of large-scale generative adversarial networks, 2) exploring and understanding the pre-trained GAN models, and 3) leveraging these models for subsequent tasks like image restoration and editing. More information about relevant methods and repositories can be found at https://github.com/csmliu/pretrained-GANs.
2022-07-21 05:07:32 - Automatic Gaze Analysis: A Survey of Deep Learning based Approaches - *Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, Qiang Ji* - `2108.05479v3` - [abs](http://arxiv.org/abs/2108.05479v3) - [pdf](http://arxiv.org/pdf/2108.05479v3) > Eye gaze analysis is an important research problem in the field of Computer Vision and Human-Computer Interaction. Even with notable progress in the last 10 years, automatic gaze analysis still remains challenging due to the uniqueness of eye appearance, eye-head interplay, occlusion, image quality, and illumination conditions. There are several open questions, including what are the important cues to interpret gaze direction in an unconstrained environment without prior knowledge and how to encode them in real-time. We review the progress across a range of gaze analysis tasks and applications to elucidate these fundamental questions, identify effective methods in gaze analysis, and provide possible future directions. We analyze recent gaze estimation and segmentation methods, especially in the unsupervised and weakly supervised domain, based on their advantages and reported evaluation metrics. Our analysis shows that the development of a robust and generic gaze analysis method still needs to address real-world challenges such as unconstrained setup and learning with less supervision. We conclude by discussing future research directions for designing a real-world gaze analysis system that can propagate to other domains including Computer Vision, Augmented Reality (AR), Virtual Reality (VR), and Human Computer Interaction (HCI). Project Page: https://github.com/i-am-shreya/EyeGazeSurvey}{https://github.com/i-am-shreya/EyeGazeSurvey
2022-07-21 18:43:03 - BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition - *Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu* - `2109.13226v3` - [abs](http://arxiv.org/abs/2109.13226v3) - [pdf](http://arxiv.org/pdf/2109.13226v3) > We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.
2022-07-22 09:02:02 - Vision-based Human Fall Detection Systems using Deep Learning: A Review - *Ekram Alam, Abu Sufian, Paramartha Dutta, Marco Leo* - `2207.10952v1` - [abs](http://arxiv.org/abs/2207.10952v1) - [pdf](http://arxiv.org/pdf/2207.10952v1) > Human fall is one of the very critical health issues, especially for elders and disabled people living alone. The number of elder populations is increasing steadily worldwide. Therefore, human fall detection is becoming an effective technique for assistive living for those people. For assistive living, deep learning and computer vision have been used largely. In this review article, we discuss deep learning (DL)-based state-of-the-art non-intrusive (vision-based) fall detection techniques. We also present a survey on fall detection benchmark datasets. For a clear understanding, we briefly discuss different metrics which are used to evaluate the performance of the fall detection systems. This article also gives a future direction on vision-based human fall detection techniques.
2022-07-22 10:21:38 - Algorithmic Fairness in Business Analytics: Directions for Research and Practice - *Maria De-Arteaga, Stefan Feuerriegel, Maytal Saar-Tsechansky* - `2207.10991v1` - [abs](http://arxiv.org/abs/2207.10991v1) - [pdf](http://arxiv.org/pdf/2207.10991v1) > The extensive adoption of business analytics (BA) has brought financial gains and increased efficiencies. However, these advances have simultaneously drawn attention to rising legal and ethical challenges when BA inform decisions with fairness implications. As a response to these concerns, the emerging study of algorithmic fairness deals with algorithmic outputs that may result in disparate outcomes or other forms of injustices for subgroups of the population, especially those who have been historically marginalized. Fairness is relevant on the basis of legal compliance, social responsibility, and utility; if not adequately and systematically addressed, unfair BA systems may lead to societal harms and may also threaten an organization's own survival, its competitiveness, and overall performance. This paper offers a forward-looking, BA-focused review of algorithmic fairness. We first review the state-of-the-art research on sources and measures of bias, as well as bias mitigation algorithms. We then provide a detailed discussion of the utility-fairness relationship, emphasizing that the frequent assumption of a trade-off between these two constructs is often mistaken or short-sighted. Finally, we chart a path forward by identifying opportunities for business scholars to address impactful, open challenges that are key to the effective and responsible deployment of BA.
2022-07-22 12:35:22 - Modeling Complex Dependencies for Session-based Recommendations via Graph Neural Networks - *Qian Zhang, Wenpeng Lu* - `2201.12532v2` - [abs](http://arxiv.org/abs/2201.12532v2) - [pdf](http://arxiv.org/pdf/2201.12532v2) > Session-based recommendations (SBRs) capture items' dependencies from the sessions to recommend the next item. In recent years, Graph neural networks (GNN) based SBRs have become the mainstream of SBRs benefited from the superiority of GNN in modeling complex dependencies. Based on a strong assumption of adjacent dependency, any two adjacent items in a session are necessarily dependent in most GNN-based SBRs. However, we argue that due to the uncertainty and complexity of user behaviors, adjacency does not necessarily indicate dependency. However, the above assumptions do not always hold in actual recommendation scenarios, so it can easily lead to two drawbacks: (1) false dependencies occur in the session because there are adjacent but not really dependent items, and (2) the missing of true dependencies occur in the session because there are non-adjacent but actually dependent items. These drawbacks significantly affect item representation learning, degrading the downstream recommendation performance. To address these deficiencies, we propose a novel review-refined inter-item graph neural network (RI-GNN), which utilizes topic information extracted from the reviews of items to improve dependencies between items. Experiments on two public real-world datasets demonstrate that RI-GNN outperforms SOTA methods.
2022-07-22 19:58:17 - Tradeoffs in Preventing Manipulation in Paper Bidding for Reviewer Assignment - *Steven Jecmen, Nihar B. Shah, Fei Fang, Vincent Conitzer* - `2207.11315v1` - [abs](http://arxiv.org/abs/2207.11315v1) - [pdf](http://arxiv.org/pdf/2207.11315v1) > Many conferences rely on paper bidding as a key component of their reviewer assignment procedure. These bids are then taken into account when assigning reviewers to help ensure that each reviewer is assigned to suitable papers. However, despite the benefits of using bids, reliance on paper bidding can allow malicious reviewers to manipulate the paper assignment for unethical purposes (e.g., getting assigned to a friend's paper). Several different approaches to preventing this manipulation have been proposed and deployed. In this paper, we enumerate certain desirable properties that algorithms for addressing bid manipulation should satisfy. We then offer a high-level analysis of various approaches along with directions for future investigation.
2022-07-23 11:00:11 - Real Time Object Detection System with YOLO and CNN Models: A Review - *Viswanatha V, Chandana R K, Ramachandra A. C.* - `2208.00773v1` - [abs](http://arxiv.org/abs/2208.00773v1) - [pdf](http://arxiv.org/pdf/2208.00773v1) > The field of artificial intelligence is built on object detection techniques. YOU ONLY LOOK ONCE (YOLO) algorithm and it's more evolved versions are briefly described in this research survey. This survey is all about YOLO and convolution neural networks (CNN)in the direction of real time object detection.YOLO does generalized object representation more effectively without precision losses than other object detection models.CNN architecture models have the ability to eliminate highlights and identify objects in any given image. When implemented appropriately, CNN models can address issues like deformity diagnosis, creating educational or instructive application, etc. This article reached atnumber of observations and perspective findings through the analysis.Also it provides support for the focused visual information and feature extraction in the financial and other industries, highlights the method of target detection and feature selection, and briefly describe the development process of YOLO algorithm.
2022-07-23 22:37:22 - A Historical Interaction between Artificial Intelligence and Philosophy - *Youheng Zhang* - `2208.04148v1` - [abs](http://arxiv.org/abs/2208.04148v1) - [pdf](http://arxiv.org/pdf/2208.04148v1) > This paper reviews the historical development of AI and representative philosophical thinking from the perspective of the research paradigm. Additionally, it considers the methodology and applications of AI from a philosophical perspective and anticipates its continued advancement. In the history of AI, Symbolism and connectionism are the two main paradigms in AI research. Symbolism holds that the world can be explained by symbols and dealt with through precise, logical processes, but connectionism believes this process should be implemented through artificial neural networks. Regardless of how intelligent machines or programs should achieve their smart goals, the historical development of AI demonstrates the best answer at this time. Still, it is not the final answer of AI research.
2022-07-24 08:25:03 - Low-resource Learning with Knowledge Graphs: A Comprehensive Survey - *Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Jeff Z. Pan, Yuan He, Wen Zhang, Ian Horrocks, Huajun Chen* - `2112.10006v5` - [abs](http://arxiv.org/abs/2112.10006v5) - [pdf](http://arxiv.org/pdf/2112.10006v5) > Machine learning methods especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for training. In real-world applications, we often need to address sample shortage due to e.g., dynamic contexts with emerging prediction targets and costly sample annotation. Therefore, low-resource learning, which aims to learn robust prediction models with limited training samples, is now being widely investigated. Among all the low-resource learning studies, many prefer to utilize some auxiliary information in the form of Knowledge Graph (KG), which is becoming more and more popular for knowledge representation, to reduce the reliance on labeled samples. In this survey, we very comprehensively reviewed over 90 papers about KG-aware research for two major low-resource learning settings -- zero-shot learning (ZSL) where new classes for prediction have never appeared in training, and few-shot learning (FSL) where new classes for prediction have only a small number of labeled samples that are available. We first introduced the KGs used in ZSL and FSL studies as well as the existing and potential KG construction solutions, and then systematically categorized and summarized KG-aware ZSL and FSL methods, dividing them into different paradigms such as the mapping-based, the data augmentation, the propagation-based and the optimization-based. We next presented different applications, including not only KG augmented prediction tasks in Computer Vision and Natural Language Processing (e.g., image classification, visual question answering and knowledge extraction), but also tasks for KG curation (e.g., inductive KG completion), and some typical evaluation resources for each task. We eventually discussed some challenges and future directions on aspects such as new learning and reasoning paradigms, and the construction of high quality KGs.
2022-07-24 08:54:57 - The Real Deal: A Review of Challenges and Opportunities in Moving Reinforcement Learning-Based Traffic Signal Control Systems Towards Reality - *Rex Chen, Fei Fang, Norman Sadeh* - `2206.11996v2` - [abs](http://arxiv.org/abs/2206.11996v2) - [pdf](http://arxiv.org/pdf/2206.11996v2) > Traffic signal control (TSC) is a high-stakes domain that is growing in importance as traffic volume grows globally. An increasing number of works are applying reinforcement learning (RL) to TSC; RL can draw on an abundance of traffic data to improve signalling efficiency. However, RL-based signal controllers have never been deployed. In this work, we provide the first review of challenges that must be addressed before RL can be deployed for TSC. We focus on four challenges involving (1) uncertainty in detection, (2) reliability of communications, (3) compliance and interpretability, and (4) heterogeneous road users. We show that the literature on RL-based TSC has made some progress towards addressing each challenge. However, more work should take a systems thinking approach that considers the impacts of other pipeline components on RL.
2022-07-24 17:56:27 - Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish - *Büşra Marşan, Salih Furkan Akkurt, Muhammet Şen, Merve Gürbüz, Onur Güngör, Şaziye Betül Özateş, Suzan Üsküdarlı, Arzucan Özgür, Tunga Güngör, Balkız Öztürk* - `2207.11782v1` - [abs](http://arxiv.org/abs/2207.11782v1) - [pdf](http://arxiv.org/pdf/2207.11782v1) > In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework. In order to tackle these issues, new annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation. Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency parser and an updated version of the BoAT Tool is introduced.
2022-07-25 09:51:17 - A Survey of Historical Document Image Datasets - *Konstantina Nikolaidou, Mathias Seuret, Hamam Mokayed, Marcus Liwicki* - `2203.08504v2` - [abs](http://arxiv.org/abs/2203.08504v2) - [pdf](http://arxiv.org/pdf/2203.08504v2) > This paper presents a systematic literature review of image datasets for document image analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate datasets for historical document analysis is a crucial prerequisite to facilitate research using different machine learning algorithms. However, because of the very large variety of the actual data (e.g., scripts, tasks, dates, support systems, and amount of deterioration), the different formats for data and label representation, and the different evaluation processes and benchmarks, finding appropriate datasets is a difficult task. This work fills this gap, presenting a meta-study on existing datasets. After a systematic selection process (according to PRISMA guidelines), we select 56 studies that are chosen based on different factors, such as the year of publication, number of methods implemented in the article, reliability of the chosen algorithms, dataset size, and journal outlet. We summarize each study by assigning it to one of three pre-defined tasks: document classification, layout structure, or semantic analysis. We present the statistics, document type, language, tasks, input visual aspects, and ground truth information for every dataset. In addition, we provide the benchmark tasks and results from these papers or recent competitions. We further discuss gaps and challenges in this domain. We advocate for providing conversion tools to common formats (e.g., COCO format for computer vision tasks) and always providing a set of evaluation metrics, instead of just one, to make results comparable across studies.
2022-07-25 10:47:47 - A Reference Data Model for Process-Related User Interaction Logs - *Luka Abb, Jana-Rebecca Rehse* - `2207.12054v1` - [abs](http://arxiv.org/abs/2207.12054v1) - [pdf](http://arxiv.org/pdf/2207.12054v1) > User interaction (UI) logs are high-resolution event logs that record low-level activities performed by a user during the execution of a task in an information system. Each event in a UI log corresponds to a single interaction between the user and the interface, such as clicking a button or entering a string into a text field. UI logs are used for purposes like task mining or robotic process automation (RPA), but each study and tool relies on a different conceptualization and implementation of the elements and attributes that constitute user interactions. This lack of standardization makes it difficult to integrate UI logs from different sources and to combine tools for UI data collection with downstream analytics or automation solutions. To address this, we propose a universally applicable reference data model for process-related UI logs. Based on a review of scientific literature and industry solutions, this model includes the core attributes of UI logs, but remains flexible with regard to the scope, level of abstraction, and case notion. We provide an implementation of the model as an extension to the XES interchange standard for event logs and demonstrate its practical applicability in a real-life RPA scenario.
2022-07-25 17:49:29 - LightX3ECG: A Lightweight and eXplainable Deep Learning System for 3-lead Electrocardiogram Classification - *Khiem H. Le, Hieu H. Pham, Thao BT. Nguyen, Tu A. Nguyen, Tien N. Thanh, Cuong D. Do* - `2207.12381v1` - [abs](http://arxiv.org/abs/2207.12381v1) - [pdf](http://arxiv.org/pdf/2207.12381v1) > Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices and most of the current research, standard 12-lead ECG is mainly used. However, using a lower number of leads can make ECG more prevalent as it can be conveniently recorded by portable or wearable devices. In this research, we develop a novel deep learning system to accurately identify multiple cardiovascular abnormalities by using only three ECG leads.
2022-07-25 23:21:48 - Innovations in Neural Data-to-text Generation - *Mandar Sharma, Ajay Gogineni, Naren Ramakrishnan* - `2207.12571v1` - [abs](http://arxiv.org/abs/2207.12571v1) - [pdf](http://arxiv.org/pdf/2207.12571v1) > The neural boom that has sparked natural language processing (NLP) research through the last decade has similarly led to significant innovations in data-to-text generation (DTG). This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols. This survey draws boundaries separating DTG from the rest of the natural language generation (NLG) landscape, encompassing an up-to-date synthesis of the literature, and highlighting the stages of technological adoption from within and outside the greater NLG umbrella. With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.
2022-07-26 01:45:54 - A Survey of Explainable Graph Neural Networks: Taxonomy and Evaluation Metrics - *Yiqiao Li, Jianlong Zhou, Sunny Verma, Fang Chen* - `2207.12599v1` - [abs](http://arxiv.org/abs/2207.12599v1) - [pdf](http://arxiv.org/pdf/2207.12599v1) > Graph neural networks (GNNs) have demonstrated a significant boost in prediction performance on graph data. At the same time, the predictions made by these models are often hard to interpret. In that regard, many efforts have been made to explain the prediction mechanisms of these models from perspectives such as GNNExplainer, XGNN and PGExplainer. Although such works present systematic frameworks to interpret GNNs, a holistic review for explainable GNNs is unavailable. In this survey, we present a comprehensive review of explainability techniques developed for GNNs. We focus on explainable graph neural networks and categorize them based on the use of explainable methods. We further provide the common performance metrics for GNNs explanations and point out several future research directions.
2022-07-26 04:16:00 - Learning Bipedal Walking On Planned Footsteps For Humanoid Robots - *Rohan Pratap Singh, Mehdi Benallegue, Mitsuharu Morisawa, Rafael Cisneros, Fumio Kanehiro* - `2207.12644v1` - [abs](http://arxiv.org/abs/2207.12644v1) - [pdf](http://arxiv.org/pdf/2207.12644v1) > Deep reinforcement learning (RL) based controllers for legged robots have demonstrated impressive robustness for walking in different environments for several robot platforms. To enable the application of RL policies for humanoid robots in real-world settings, it is crucial to build a system that can achieve robust walking in any direction, on 2D and 3D terrains, and be controllable by a user-command. In this paper, we tackle this problem by learning a policy to follow a given step sequence. The policy is trained with the help of a set of procedurally generated step sequences (also called footstep plans). We show that simply feeding the upcoming 2 steps to the policy is sufficient to achieve omnidirectional walking, turning in place, standing, and climbing stairs. Our method employs curriculum learning on the complexity of terrains, and circumvents the need for reference motions or pre-trained weights. We demonstrate the application of our proposed method to learn RL policies for 2 new robot platforms - HRP5P and JVRC-1 - in the MuJoCo simulation environment. The code for training and evaluation is available online.
2022-07-26 08:26:26 - An Automated News Bias Classifier Using Caenorhabditis Elegans Inspired Recursive Feedback Network Architecture - *Agastya Sridharan, Natarajan S* - `2207.12724v1` - [abs](http://arxiv.org/abs/2207.12724v1) - [pdf](http://arxiv.org/pdf/2207.12724v1) > Traditional approaches to classify the political bias of news articles have failed to generate accurate, generalizable results. Existing networks premised on CNNs and DNNs lack a model to identify and extrapolate subtle indicators of bias like word choice, context, and presentation. In this paper, we propose a network architecture that achieves human-level accuracy in assigning bias classifications to articles. The underlying model is based on a novel Mesh Neural Network (MNN),this structure enables feedback and feedforward synaptic connections between any two neurons in the mesh. The MNN ontains six network configurations that utilize Bernoulli based random sampling, pre-trained DNNs, and a network modelled after the C. Elegans nematode. The model is trained on over ten-thousand articles scraped from AllSides.com which are labelled to indicate political bias. The parameters of the network are then evolved using a genetic algorithm suited to the feedback neural structure. Finally, the best performing model is applied to five popular news sources in the United States over a fifty-day trial to quantify political biases in the articles they display. We hope our project can spur research into biological solutions for NLP tasks and provide accurate tools for citizens to understand subtle biases in the articles they consume.
2022-07-26 14:28:38 - A Guide to Image and Video based Small Object Detection using Deep Learning : Case Study of Maritime Surveillance - *Aref Miri Rekavandi, Lian Xu, Farid Boussaid, Abd-Krim Seghouane, Stephen Hoefs, Mohammed Bennamoun* - `2207.12926v1` - [abs](http://arxiv.org/abs/2207.12926v1) - [pdf](http://arxiv.org/pdf/2207.12926v1) > Small object detection (SOD) in optical images and videos is a challenging problem that even state-of-the-art generic object detection methods fail to accurately localize and identify such objects. Typically, small objects appear in real-world due to large camera-object distance. Because small objects occupy only a small area in the input image (e.g., less than 10%), the information extracted from such a small area is not always rich enough to support decision making. Multidisciplinary strategies are being developed by researchers working at the interface of deep learning and computer vision to enhance the performance of SOD deep learning based methods. In this paper, we provide a comprehensive review of over 160 research papers published between 2017 and 2022 in order to survey this growing subject. This paper summarizes the existing literature and provide a taxonomy that illustrates the broad picture of current research. We investigate how to improve the performance of small object detection in maritime environments, where increasing performance is critical. By establishing a connection between generic and maritime SOD research, future directions have been identified. In addition, the popular datasets that have been used for SOD for generic and maritime applications are discussed, and also well-known evaluation metrics for the state-of-the-art methods on some of the datasets are provided.
2022-07-26 15:01:37 - FastGeodis: Fast Generalised Geodesic Distance Transform - *Muhammad Asad, Reuben Dorent, Tom Vercauteren* - `2208.00001v1` - [abs](http://arxiv.org/abs/2208.00001v1) - [pdf](http://arxiv.org/pdf/2208.00001v1) > The FastGeodis package provides an efficient implementation for computing Geodesic and Euclidean distance transforms (or a mixture of both) targeting efficient utilisation of CPU and GPU hardwares. In particular, it implements paralellisable raster scan method from Criminisi et al, where elements in row (2D) or plane (3D) can be computed with parallel threads. This package is able to handle 2D as well as 3D data where it achieves up to 15x speed-up on CPU and up to 60x speed-up on GPU as compared to existing open-source libraries, which uses non-parallelisable single-thread CPU implementation. The performance speed-ups reported here were evaluated using 3D volume data on Nvidia GeForce Titan X (12 GB) with 6-Core Intel Xeon E5-1650 CPU. This package is available at: https://github.com/masadcv/FastGeodis
2022-07-26 15:55:36 - Robust and Efficient Segmentation of Cross-domain Medical Images - *Xingqun Qi, Zhuojie Wu, Min Ren, Muyi Sun, Zhenan Sun* - `2207.12995v1` - [abs](http://arxiv.org/abs/2207.12995v1) - [pdf](http://arxiv.org/pdf/2207.12995v1) > Efficient medical image segmentation aims to provide accurate pixel-wise prediction for the medical images with the lightweight implementation framework. However, lightweight frameworks generally fail to achieve high performance, and suffer from the poor generalizable ability on cross-domain tasks.In this paper, we propose a generalizable knowledge distillation method for robust and efficient segmentation of cross-domain medical images. Primarily, we propose the Model-Specific Alignment Networks (MSAN) to provide the domain-invariant representations which are regularized by a Pre-trained Semantic AutoEncoder (P-SAE). Meanwhile, a customized Alignment Consistency Training (ACT) strategy is designed to promote the MSAN training. With the domain-invariant representative vectors in MSAN, we propose two generalizable knowledge distillation schemes, Dual Contrastive Graph Distillation (DCGD) and Domain-Invariant Cross Distillation (DICD). Specifically, in DCGD, two types of implicit contrastive graphs are designed to represent the intra-coupling and inter-coupling semantic correlations from the perspective of data distribution. In DICD, the domain-invariant semantic vectors from the two models (i.e., teacher and student) are leveraged to cross-reconstruct features by the header exchange of MSAN, which achieves generalizable improvement for both the encoder and decoder in the student model. Furthermore, a metric named Fr\'echet Semantic Distance (FSD) is tailored to verify the effectiveness of the regularized domain-invariant features. Extensive experiments conducted on the Liver and Retinal Vessel Segmentation datasets demonstrate the priority of our method, in terms of performance and generalization on lightweight frameworks.
2022-07-26 16:09:07 - Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark - *Zhenran Xu, Zifei Shan, Yuxin Li, Baotian Hu, Bing Qin* - `2207.13005v1` - [abs](http://arxiv.org/abs/2207.13005v1) - [pdf](http://arxiv.org/pdf/2207.13005v1) > Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.
2022-07-26 17:13:53 - Efficient High-Resolution Deep Learning: A Survey - *Arian Bakhtiarnia, Qi Zhang, Alexandros Iosifidis* - `2207.13050v1` - [abs](http://arxiv.org/abs/2207.13050v1) - [pdf](http://arxiv.org/pdf/2207.13050v1) > Cameras in modern devices such as smartphones, satellites and medical equipment are capable of capturing very high resolution images and videos. Such high-resolution data often need to be processed by deep learning models for cancer detection, automated road navigation, weather prediction, surveillance, optimizing agricultural processes and many other applications. Using high-resolution images and videos as direct inputs for deep learning models creates many challenges due to their high number of parameters, computation cost, inference latency and GPU memory consumption. Simple approaches such as resizing the images to a lower resolution are common in the literature, however, they typically significantly decrease accuracy. Several works in the literature propose better alternatives in order to deal with the challenges of high-resolution data and improve accuracy and speed while complying with hardware limitations and time restrictions. This survey describes such efficient high-resolution deep learning methods, summarizes real-world applications of high-resolution deep learning, and provides comprehensive information about available high-resolution datasets.
2022-07-26 18:00:04 - Multimodal Image Synthesis and Editing: A Survey - *Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, Eric Xing* - `2112.13592v3` - [abs](http://arxiv.org/abs/2112.13592v3) - [pdf](http://arxiv.org/pdf/2112.13592v3) > As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modelling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years. Instead of providing explicit guidance for network training, multimodal guidance offers intuitive and flexible means for image synthesis and editing. On the other hand, this field is also facing several challenges in alignment of features with inherent modality gaps, synthesis of high-resolution images, faithful evaluation metrics, etc. In this survey, we comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modality and model architectures. We start with an introduction to different types of guidance modalities in image synthesis and editing. We then describe multimodal image synthesis and editing approaches extensively with detailed frameworks including Generative Adversarial Networks (GANs), Auto-regressive models, Diffusion models, Neural Radiance Fields (NeRF) and other methods. This is followed by a comprehensive description of benchmark datasets and corresponding evaluation metrics as widely adopted in multimodal image synthesis and editing, as well as detailed comparisons of various synthesis methods with analysis of respective advantages and limitations. Finally, we provide insights about the current research challenges and possible directions for future research. We hope this survey could lay a sound and valuable foundation for future development of multimodal image synthesis and editing. A project associated with this survey is available at https://github.com/fnzhan/MISE.
2022-07-26 20:56:18 - Planning and Learning: A Review of Methods involving Path-Planning for Autonomous Vehicles - *Kevin Osanlou, Christophe Guettier, Tristan Cazenave, Eric Jacopin* - `2207.13181v1` - [abs](http://arxiv.org/abs/2207.13181v1) - [pdf](http://arxiv.org/pdf/2207.13181v1) > This short review aims to make the reader familiar with state-of-the-art works relating to planning, scheduling and learning. First, we study state-of-the-art planning algorithms. We give a brief introduction of neural networks. Then we explore in more detail graph neural networks, a recent variant of neural networks suited for processing graph-structured inputs. We describe briefly the concept of reinforcement learning algorithms and some approaches designed to date. Next, we study some successful approaches combining neural networks for path-planning. Lastly, we focus on temporal planning problems with uncertainty.
2022-07-26 23:20:03 - A Survey of Intent Classification and Slot-Filling Datasets for Task-Oriented Dialog - *Stefan Larson, Kevin Leach* - `2207.13211v1` - [abs](http://arxiv.org/abs/2207.13211v1) - [pdf](http://arxiv.org/pdf/2207.13211v1) > Interest in dialog systems has grown substantially in the past decade. By extension, so too has interest in developing and improving intent classification and slot-filling models, which are two components that are commonly used in task-oriented dialog systems. Moreover, good evaluation benchmarks are important in helping to compare and analyze systems that incorporate such models. Unfortunately, much of the literature in the field is limited to analysis of relatively few benchmark datasets. In an effort to promote more robust analyses of task-oriented dialog systems, we have conducted a survey of publicly available datasets for the tasks of intent classification and slot-filling. We catalog the important characteristics of each dataset, and offer discussion on the applicability, strengths, and weaknesses of each. Our goal is that this survey aids in increasing the accessibility of these datasets, which we hope will enable their use in future evaluations of intent classification and slot-filling models for task-oriented dialog systems.
2022-07-27 10:37:14 - Detecting Spam Reviews on Vietnamese E-commerce Websites - *Co Van Dinh, Son T. Luu, Anh Gia-Tuan Nguyen* - `2207.14636v1` - [abs](http://arxiv.org/abs/2207.14636v1) - [pdf](http://arxiv.org/pdf/2207.14636v1) > The reviews of customers play an essential role in online shopping. People often refer to reviews or comments of previous customers to decide whether to buy a new product. Catching up with this behavior, some people create untruths and illegitimate reviews to hoax customers about the fake quality of products. These reviews are called spam reviews, which confuse consumers on online shopping platforms and negatively affect online shopping behaviors. We propose the dataset called ViSpamReviews, which has a strict annotation procedure for detecting spam reviews on e-commerce platforms. Our dataset consists of two tasks: the binary classification task for detecting whether a review is a spam or not and the multi-class classification task for identifying the type of spam. The PhoBERT obtained the highest results on both tasks, 88.93% and 72.17%, respectively, by macro average F1 score.
2022-07-27 17:05:16 - Using Deep Learning to Detecting Deepfakes - *Jacob Mallet, Rushit Dave, Naeem Seliya, Mounika Vanamala* - `2207.13644v1` - [abs](http://arxiv.org/abs/2207.13644v1) - [pdf](http://arxiv.org/pdf/2207.13644v1) > In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one persons face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.
2022-07-27 22:12:29 - The Familiarity Hypothesis: Explaining the Behavior of Deep Open Set Methods - *Thomas G. Dietterich, Alexander Guyer* - `2203.02486v4` - [abs](http://arxiv.org/abs/2203.02486v4) - [pdf](http://arxiv.org/pdf/2203.02486v4) > In many object recognition applications, the set of possible categories is an open set, and the deployed recognition system will encounter novel objects belonging to categories unseen during training. Detecting such "novel category" objects is usually formulated as an anomaly detection problem. Anomaly detection algorithms for feature-vector data identify anomalies as outliers, but outlier detection has not worked well in deep learning. Instead, methods based on the computed logits of visual object classifiers give state-of-the-art performance. This paper proposes the Familiarity Hypothesis that these methods succeed because they are detecting the absence of familiar learned features rather than the presence of novelty. This distinction is important, because familiarity-based detection will fail in many situations where novelty is present. For example when an image contains both a novel object and a familiar one, the familiarity score will be high, so the novel object will not be noticed. The paper reviews evidence from the literature and presents additional evidence from our own experiments that provide strong support for this hypothesis. The paper concludes with a discussion of whether familiarity-based detection is an inevitable consequence of representation learning.
2022-07-28 06:25:49 - The Minimum Description Length Principle for Pattern Mining: A Survey - *Esther Galbrun* - `2007.14009v5` - [abs](http://arxiv.org/abs/2007.14009v5) - [pdf](http://arxiv.org/pdf/2007.14009v5) > This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems.
2022-07-28 08:09:18 - Algorithmic Foundation of Deep X-Risk Optimization - *Tianbao Yang* - `2206.00439v4` - [abs](http://arxiv.org/abs/2206.00439v4) - [pdf](http://arxiv.org/pdf/2206.00439v4) > X-risk is a term introduced to represent a family of compositional measures or objectives, in which each data point is compared with a large number of items explicitly or implicitly for defining a risk function. It includes many widely used measures or objectives, e.g., AUROC, AUPRC, partial AUROC, NDCG, MAP, top-$K$ NDCG, top-$K$ MAP, listwise losses, p-norm push, top push, precision/recall at top $K$ positions, precision at a certain recall level, contrastive objectives, etc. While these non-decomposable measures/objectives and their optimization algorithms have been studied in the literature of machine learning, computer vision, information retrieval, and etc, optimizing these measures/objectives has encountered some unique challenges for deep learning. In this paper, we survey recent rigorous efforts for deep X-risk optimization (DXO) by focusing on its algorithmic foundation. We introduce a class of techniques for optimizing X-risks for deep learning. We formulate DXO into three special families of non-convex optimization problems belonging to non-convex min-max optimization, non-convex compositional optimization, and non-convex bilevel optimization, respectively. For each family of problems, we present some strong baseline algorithms and their complexities, which will motivate further research for improving the existing results. Discussions about the presented results and future studies are given at the end. Efficient algorithms for optimizing a variety of X-risks are implemented in the LibAUC library at www.libauc.org.
2022-07-28 14:33:00 - A Survey of Syntactic Modelling Structures in Biomedical Ontologies - *Christian Kindermann, Martin G. Skjæveland* - `2207.14119v1` - [abs](http://arxiv.org/abs/2207.14119v1) - [pdf](http://arxiv.org/pdf/2207.14119v1) > Despite the large-scale uptake of semantic technologies in the biomedical domain, little is known about common modelling practices in published ontologies. OWL ontologies are often published only in the crude form of sets of axioms leaving the underlying design opaque. However, a principled and systematic ontology development life cycle is likely to be reflected in regularities of the ontology's emergent syntactic structure. To develop an understanding of this emergent structure, we propose to reverse-engineer ontologies taking a syntax-directed approach for identifying and analysing regularities for axioms and sets of axioms. We survey BioPortal in terms of syntactic modelling trends and common practices for OWL axioms and class frames. Our findings suggest that biomedical ontologies only share simple syntactic structures in which OWL constructors are not deeply nested or combined in a complex manner. While such simple structures often account for large proportions of axioms in a given ontology, many ontologies also contain non-trivial amounts of more complex syntactic structures that are not common across ontologies.
2022-07-28 16:15:17 - Progressive Voronoi Diagram Subdivision: Towards A Holistic Geometric Framework for Exemplar-free Class-Incremental Learning - *Chunwei Ma, Zhanghexuan Ji, Ziyun Huang, Yan Shen, Mingchen Gao, Jinhui Xu* - `2207.14202v1` - [abs](http://arxiv.org/abs/2207.14202v1) - [pdf](http://arxiv.org/pdf/2207.14202v1) > Exemplar-free Class-incremental Learning (CIL) is a challenging problem because rehearsing data from previous phases is strictly prohibited, causing catastrophic forgetting of Deep Neural Networks (DNNs). In this paper, we present iVoro, a holistic framework for CIL, derived from computational geometry. We found Voronoi Diagram (VD), a classical model for space subdivision, is especially powerful for solving the CIL problem, because VD itself can be constructed favorably in an incremental manner -- the newly added sites (classes) will only affect the proximate classes, making the non-contiguous classes hardly forgettable. Further, in order to find a better set of centers for VD construction, we colligate DNN with VD using Power Diagram and show that the VD structure can be optimized by integrating local DNN models using a divide-and-conquer algorithm. Moreover, our VD construction is not restricted to the deep feature space, but is also applicable to multiple intermediate feature spaces, promoting VD to be multi-centered VD (CIVD) that efficiently captures multi-grained features from DNN. Importantly, iVoro is also capable of handling uncertainty-aware test-time Voronoi cell assignment and has exhibited high correlations between geometric uncertainty and predictive accuracy (up to ~0.9). Putting everything together, iVoro achieves up to 25.26%, 37.09%, and 33.21% improvements on CIFAR-100, TinyImageNet, and ImageNet-Subset, respectively, compared to the state-of-the-art non-exemplar CIL approaches. In conclusion, iVoro enables highly accurate, privacy-preserving, and geometrically interpretable CIL that is particularly useful when cross-phase data sharing is forbidden, e.g. in medical applications. Our code is available at https://machunwei.github.io/ivoro.
2022-07-28 16:43:31 - Humans disagree with the IoU for measuring object detector localization error - *Ombretta Strafforello, Vanathi Rajasekart, Osman S. Kayhan, Oana Inel, Jan van Gemert* - `2207.14221v1` - [abs](http://arxiv.org/abs/2207.14221v1) - [pdf](http://arxiv.org/pdf/2207.14221v1) > The localization quality of automatic object detectors is typically evaluated by the Intersection over Union (IoU) score. In this work, we show that humans have a different view on localization quality. To evaluate this, we conduct a survey with more than 70 participants. Results show that for localization errors with the exact same IoU score, humans might not consider that these errors are equal, and express a preference. Our work is the first to evaluate IoU with humans and makes it clear that relying on IoU scores alone to evaluate localization errors might not be sufficient.
2022-07-28 18:11:17 - Automated liver tissues delineation techniques: A systematic survey on machine learning current trends and future orientations - *Ayman Al-Kababji, Faycal Bensaali, Sarada Prasad Dakua, Yassine Himeur* - `2103.06384v2` - [abs](http://arxiv.org/abs/2103.06384v2) - [pdf](http://arxiv.org/pdf/2103.06384v2) > Machine learning and computer vision techniques have grown rapidly in recent years due to their automation, suitability, and ability to generate astounding results. Hence, in this paper, we survey the key studies that are published between 2014 and 2022, showcasing the different machine learning algorithms researchers have used to segment the liver, hepatic tumors, and hepatic-vasculature structures. We divide the surveyed studies based on the tissue of interest (hepatic-parenchyma, hepatic-tumors, or hepatic-vessels), highlighting the studies that tackle more than one task simultaneously. Additionally, the machine learning algorithms are classified as either supervised or unsupervised, and they are further partitioned if the amount of work that falls under a certain scheme is significant. Moreover, different datasets and challenges found in literature and websites containing masks of the aforementioned tissues are thoroughly discussed, highlighting the organizers' original contributions and those of other researchers. Also, the metrics used excessively in literature are mentioned in our review, stressing their relevance to the task at hand. Finally, critical challenges and future directions are emphasized for innovative researchers to tackle, exposing gaps that need addressing, such as the scarcity of many studies on the vessels' segmentation challenge and why their absence needs to be dealt with sooner than later.
2022-07-28 18:52:08 - Aztec curve: proposal for a new space-filling curve - *Diego Ayala, Daniel Durini, Jose Rangel-Magdaleno* - `2207.14345v1` - [abs](http://arxiv.org/abs/2207.14345v1) - [pdf](http://arxiv.org/pdf/2207.14345v1) > Different space-filling curves (SFCs) are briefly reviewed in this paper, and a new one is proposed. A century has passed between the inception of this kind of curves, since then they have been found useful in computer science, particularly in data storage and indexing due to their clustering properties, being Hilbert curve the most well-known member of the family of fractals. The proposed Aztec curve, with similar characteristics to the Hilbert's curve, is introduced in this paper, accompanied by a grammatical description for its construction. It yields the possibility of creating bi-dimensional clusters, not available for Hilbert nor Peano curves. Additional to this, a case of application on the scope of Compressed Sensing is implemented, in which the use of Hilbert curve is contrasted with Aztec curve, having a similar performance, and positioning the Aztec curve as viable and a new alternative for future exploitation on applications that make use of SFC's.
2022-07-28 21:09:57 - Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey - *Seyed Mojtaba Marvasti-Zadeh, Mohammad N. S. Jahromi, Javad Khaghani, Devin Goodsman, Nilanjan Ray, Nadir Erbilgin* - `2207.04512v2` - [abs](http://arxiv.org/abs/2207.04512v2) - [pdf](http://arxiv.org/pdf/2207.04512v2) > In nature, the collective behavior of animals, such as flying birds is dominated by the interactions between individuals of the same species. However, the study of such behavior among the bird species is a complex process that humans cannot perform using conventional visual observational techniques such as focal sampling in nature. For social animals such as birds, the mechanism of group formation can help ecologists understand the relationship between social cues and their visual characteristics over time (e.g., pose and shape). But, recovering the varying pose and shapes of flying birds is a highly challenging problem. A widely-adopted solution to tackle this bottleneck is to extract the pose and shape information from 2D image to 3D correspondence. Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation, each with different pros and cons. To the best of our knowledge, this work is the first attempt to provide an overview of recent advances in 3D bird reconstruction based on monocular vision, give both computer vision and biology researchers an overview of existing approaches, and compare their characteristics.
2022-07-28 22:24:40 - Rating and aspect-based opinion graph embeddings for explainable recommendations - *Iván Cantador, Andrés Carvallo, Fernando Diez* - `2107.03385v2` - [abs](http://arxiv.org/abs/2107.03385v2) - [pdf](http://arxiv.org/pdf/2107.03385v2) > The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, recent recommendation methods based on graph embeddings have shown state-of-the-art performance. In general, these methods encode latent rating patterns and content features. Differently from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Additionally, our method has the advantage of providing explanations that involve the coverage of aspect-based opinions given by users about recommended items.
2022-07-28 22:24:52 - Graphing else matters: exploiting aspect opinions and ratings in explainable graph-based recommendations - *Iván Cantador, Andrés Carvallo, Fernando Diez, Denis Parra* - `2107.03226v2` - [abs](http://arxiv.org/abs/2107.03226v2) - [pdf](http://arxiv.org/pdf/2107.03226v2) > The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, current recommendation methods based on graph embeddings have shown state-of-the-art performance. These methods commonly encode latent rating patterns and content features. Different from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Our approach has the advantage of providing explanations which leverage aspect-based opinions given by users about recommended items. Furthermore, we also provide examples of the applicability of recommendations utilizing aspect opinions as explanations in a visualization dashboard, which allows obtaining information about the most and least liked aspects of similar users obtained from the embeddings of an input graph.
2022-07-29 03:10:18 - Deep Learning-based Occluded Person Re-identification: A Survey - *Yunjie Peng, Saihui Hou, Chunshui Cao, Xu Liu, Yongzhen Huang, Zhiqiang He* - `2207.14452v1` - [abs](http://arxiv.org/abs/2207.14452v1) - [pdf](http://arxiv.org/pdf/2207.14452v1) > Occluded person re-identification (Re-ID) aims at addressing the occlusion problem when retrieving the person of interest across multiple cameras. With the promotion of deep learning technology and the increasing demand for intelligent video surveillance, the frequent occlusion in real-world applications has made occluded person Re-ID draw considerable interest from researchers. A large number of occluded person Re-ID methods have been proposed while there are few surveys that focus on occlusion. To fill this gap and help boost future research, this paper provides a systematic survey of occluded person Re-ID. Through an in-depth analysis of the occlusion in person Re-ID, most existing methods are found to only consider part of the problems brought by occlusion. Therefore, we review occlusion-related person Re-ID methods from the perspective of issues and solutions. We summarize four issues caused by occlusion in person Re-ID, i.e., position misalignment, scale misalignment, noisy information, and missing information. The occlusion-related methods addressing different issues are then categorized and introduced accordingly. After that, we summarize and compare the performance of recent occluded person Re-ID methods on four popular datasets: Partial-ReID, Partial-iLIDS, Occluded-ReID, and Occluded-DukeMTMC. Finally, we provide insights on promising future research directions.
2022-07-29 11:53:22 - "Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking - *Léo Jacqmin, Lina M. Rojas-Barahona, Benoit Favre* - `2207.14627v1` - [abs](http://arxiv.org/abs/2207.14627v1) - [pdf](http://arxiv.org/pdf/2207.14627v1) > While communicating with a user, a task-oriented dialogue system has to track the user's needs at each turn according to the conversation history. This process called dialogue state tracking (DST) is crucial because it directly informs the downstream dialogue policy. DST has received a lot of interest in recent years with the text-to-text paradigm emerging as the favored approach. In this review paper, we first present the task and its associated datasets. Then, considering a large number of recent publications, we identify highlights and advances of research in 2021-2022. Although neural approaches have enabled significant progress, we argue that some critical aspects of dialogue systems such as generalizability are still underexplored. To motivate future studies, we propose several research avenues.
2022-07-29 12:18:34 - Learning Disentangled Representations in the Imaging Domain - *Xiao Liu, Pedro Sanchez, Spyridon Thermos, Alison Q. O'Neil, Sotirios A. Tsaftaris* - `2108.12043v6` - [abs](http://arxiv.org/abs/2108.12043v6) - [pdf](http://arxiv.org/pdf/2108.12043v6) > Disentangled representation learning has been proposed as an approach to learning general representations even in the absence of, or with limited, supervision. A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for applications in computer vision and healthcare. In this tutorial paper, we motivate the need for disentangled representations, revisit key concepts, and describe practical building blocks and criteria for learning such representations. We survey applications in medical imaging emphasising choices made in exemplar key works, and then discuss links to computer vision applications. We conclude by presenting limitations, challenges, and opportunities.
2022-07-29 13:52:24 - Big Data and Analytics Implementation in Tertiary Institutions to Predict Students Performance in Nigeria - *Ozioma Collins Oguine, Kanyifeechukwu Jane Oguine, Hashim Ibrahim Bisallah* - `2207.14677v1` - [abs](http://arxiv.org/abs/2207.14677v1) - [pdf](http://arxiv.org/pdf/2207.14677v1) > The term Big Data has been coined to refer to the gargantuan bulk of data that cannot be dealt with by traditional data-handling techniques. Big Data is still a novel concept, and in the following literature, we intend to elaborate on it in a palpable fashion. It commences with the concept of the subject in itself, along with its properties and the two general approaches to dealing with it. Big Data provides an opportunity for educational Institutions to use their Information Technology resources strategically to improve educational quality, guide students to higher completion rates and improve student persistence and outcomes. This paper explores the attributes of big data that are relevant to educational institutions, investigates the factors influencing the adoption of big data and analytics in learning institutions, and seeks to establish the limiting factors hindering the use of big data in Institutions of higher learning. A survey research design was adopted in conducting this research, and Questionnaires were the instrument employed for data collection.
2022-07-29 15:27:47 - Computer Vision Methods for the Microstructural Analysis of Materials: The State-of-the-art and Future Perspectives - *Khaled Alrfou, Amir Kordijazi, Tian Zhao* - `2208.04149v1` - [abs](http://arxiv.org/abs/2208.04149v1) - [pdf](http://arxiv.org/pdf/2208.04149v1) > Finding quantitative descriptors representing the microstructural features of a given material is an ongoing research area in the paradigm of Materials-by-Design. Historically, microstructural analysis mostly relies on qualitative descriptions. However, to build a robust and accurate process-structure-properties relationship, which is required for designing new advanced high-performance materials, the extraction of quantitative and meaningful statistical data from the microstructural analysis is a critical step. In recent years, computer vision (CV) methods, especially those which are centered around convolutional neural network (CNN) algorithms have shown promising results for this purpose. This review paper focuses on the state-of-the-art CNN-based techniques that have been applied to various multi-scale microstructural image analysis tasks, including classification, object detection, segmentation, feature extraction, and reconstruction. Additionally, we identified the main challenges with regard to the application of these methods to materials science research. Finally, we discussed some possible future directions of research in this area. In particular, we emphasized the application of transformer-based models and their capabilities to improve the microstructural analysis of materials.
2022-07-30 05:18:10 - Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond - *Amir Feder, Katherine A. Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E. Roberts, Brandon M. Stewart, Victor Veitch, Diyi Yang* - `2109.00725v2` - [abs](http://arxiv.org/abs/2109.00725v2) - [pdf](http://arxiv.org/pdf/2109.00725v2) > A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community.
2022-07-30 06:21:46 - Recent Advances and New Frontiers in Spiking Neural Networks - *Duzhen Zhang, Tielin Zhang, Shuncheng Jia, Qingyu Wang, Bo Xu* - `2204.07050v6` - [abs](http://arxiv.org/abs/2204.07050v6) - [pdf](http://arxiv.org/pdf/2204.07050v6) > In recent years, spiking neural networks (SNNs) have received extensive attention in brain-inspired intelligence due to their rich spatially-temporal dynamics, various encoding methods, and event-driven characteristics that naturally fit the neuromorphic hardware. With the development of SNNs, brain-inspired intelligence, an emerging research field inspired by brain science achievements and aiming at artificial general intelligence, is becoming hot. This paper reviews recent advances and discusses new frontiers in SNNs from five major research topics, including essential elements (i.e., spiking neuron models, encoding methods, and topology structures), neuromorphic datasets, optimization algorithms, software, and hardware frameworks. We hope our survey can help researchers understand SNNs better and inspire new works to advance this field.
2022-07-30 09:59:28 - A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond - *Chaoning Zhang, Chenshuang Zhang, Junha Song, John Seon Keun Yi, Kang Zhang, In So Kweon* - `2208.00173v1` - [abs](http://arxiv.org/abs/2208.00173v1) - [pdf](http://arxiv.org/pdf/2208.00173v1) > Masked autoencoders are scalable vision learners, as the title of MAE \cite{he2022masked}, which suggests that self-supervised learning (SSL) in vision might undertake a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction (e.g., BERT) have become a de facto standard SSL practice in NLP. By contrast, early attempts at generative methods in vision have been buried by their discriminative counterparts (like contrastive learning); however, the success of mask image modeling has revived the masking autoencoder (often termed denoising autoencoder in the past). As a milestone to bridge the gap with BERT in NLP, masked autoencoder has attracted unprecedented attention for SSL in vision and beyond. This work conducts a comprehensive survey of masked autoencoders to shed insight on a promising direction of SSL. As the first to review SSL with masked autoencoders, this work focuses on its application in vision by discussing its historical developments, recent progress, and implications for diverse applications.
2022-07-30 14:38:11 - VLP: A Survey on Vision-Language Pre-training - *Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu* - `2202.09061v4` - [abs](http://arxiv.org/abs/2202.09061v4) - [pdf](http://arxiv.org/pdf/2202.09061v4) > In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.
2022-07-30 15:15:58 - Cause-and-Effect Analysis of ADAS: A Comparison Study between Literature Review and Complaint Data - *Jackie Ayoub, Zifei Wang, Meitang Li, Huizhong Guo, Rini Sherony, Shan Bao, Feng Zhou* - `2208.00249v1` - [abs](http://arxiv.org/abs/2208.00249v1) - [pdf](http://arxiv.org/pdf/2208.00249v1) > Advanced driver assistance systems (ADAS) are designed to improve vehicle safety. However, it is difficult to achieve such benefits without understanding the causes and limitations of the current ADAS and their possible solutions. This study 1) investigated the limitations and solutions of ADAS through a literature review, 2) identified the causes and effects of ADAS through consumer complaints using natural language processing models, and 3) compared the major differences between the two. These two lines of research identified similar categories of ADAS causes, including human factors, environmental factors, and vehicle factors. However, academic research focused more on human factors of ADAS issues and proposed advanced algorithms to mitigate such issues while drivers complained more of vehicle factors of ADAS failures, which led to associated top consequences. The findings from these two sources tend to complement each other and provide important implications for the improvement of ADAS in the future.
2022-07-31 06:04:15 - Exploring Attention-Aware Network Resource Allocation for Customized Metaverse Services - *Hongyang Du, Jiacheng Wang, Dusit Niyato, Jiawen Kang, Zehui Xiong, Xuemin, Shen, Dong In Kim* - `2208.00369v1` - [abs](http://arxiv.org/abs/2208.00369v1) - [pdf](http://arxiv.org/pdf/2208.00369v1) > Emerging with the support of computing and communications technologies, Metaverse is expected to bring users unprecedented service experiences. However, the increase in the number of Metaverse users places a heavy demand on network resources, especially for Metaverse services that are based on graphical extended reality and require rendering a plethora of virtual objects. To make efficient use of network resources and improve the Quality-of-Experience (QoE), we design an attention-aware network resource allocation scheme to achieve customized Metaverse services. The aim is to allocate more network resources to virtual objects in which users are more interested. We first discuss several key techniques related to Metaverse services, including QoE analysis, eye-tracking, and remote rendering. We then review existing datasets and propose the user-object-attention level (UOAL) dataset that contains the ground truth attention of 30 users to 96 objects in 1,000 images. A tutorial on how to use UOAL is presented. With the help of UOAL, we propose an attention-aware network resource allocation algorithm that has two steps, i.e., attention prediction and QoE maximization. Specially, we provide an overview of the designs of two types of attention prediction methods, i.e., interest-aware and time-aware prediction. By using the predicted user-object-attention values, network resources such as the rendering capacity of edge devices can be allocated optimally to maximize the QoE. Finally, we propose promising research directions related to Metaverse services.
2022-07-31 06:48:19 - Neuro-Symbolic Learning: Principles and Applications in Ophthalmology - *Muhammad Hassan, Haifei Guan, Aikaterini Melliou, Yuqi Wang, Qianhui Sun, Sen Zeng, Wen Liang, Yiwei Zhang, Ziheng Zhang, Qiuyue Hu, Yang Liu, Shunkai Shi, Lin An, Shuyue Ma, Ijaz Gul, Muhammad Akmal Rahee, Zhou You, Canyang Zhang, Vijay Kumar Pandey, Yuxing Han, Yongbing Zhang, Ming Xu, Qiming Huang, Jiefu Tan, Qi Xing, Peiwu Qin, Dongmei Yu* - `2208.00374v1` - [abs](http://arxiv.org/abs/2208.00374v1) - [pdf](http://arxiv.org/pdf/2208.00374v1) > Neural networks have been rapidly expanding in recent years, with novel strategies and applications. However, challenges such as interpretability, explainability, robustness, safety, trust, and sensibility remain unsolved in neural network technologies, despite the fact that they will unavoidably be addressed for critical applications. Attempts have been made to overcome the challenges in neural network computing by representing and embedding domain knowledge in terms of symbolic representations. Thus, the neuro-symbolic learning (NeSyL) notion emerged, which incorporates aspects of symbolic representation and bringing common sense into neural networks (NeSyL). In domains where interpretability, reasoning, and explainability are crucial, such as video and image captioning, question-answering and reasoning, health informatics, and genomics, NeSyL has shown promising outcomes. This review presents a comprehensive survey on the state-of-the-art NeSyL approaches, their principles, advances in machine and deep learning algorithms, applications such as opthalmology, and most importantly, future perspectives of this emerging field.
2022-07-31 08:33:25 - Towards Large-Scale Small Object Detection: Survey and Benchmarks - *Gong Cheng, Xiang Yuan, Xiwen Yao, Kebing Yan, Qinghua Zeng, Junwei Han* - `2207.14096v2` - [abs](http://arxiv.org/abs/2207.14096v2) - [pdf](http://arxiv.org/pdf/2207.14096v2) > With the rise of deep convolutional neural networks, object detection has achieved prominent advances in past years. However, such prosperity could not camouflage the unsatisfactory situation of Small Object Detection (SOD), one of the notoriously challenging tasks in computer vision, owing to the poor visual appearance and noisy representation caused by the intrinsic structure of small targets. In addition, large-scale dataset for benchmarking small object detection methods remains a bottleneck. In this paper, we first conduct a thorough review of small object detection. Then, to catalyze the development of SOD, we construct two large-scale Small Object Detection dAtasets (SODA), SODA-D and SODA-A, which focus on the Driving and Aerial scenarios respectively. SODA-D includes 24704 high-quality traffic images and 277596 instances of 9 categories. For SODA-A, we harvest 2510 high-resolution aerial images and annotate 800203 instances over 9 classes. The proposed datasets, as we know, are the first-ever attempt to large-scale benchmarks with a vast collection of exhaustively annotated instances tailored for multi-category SOD. Finally, we evaluate the performance of mainstream methods on SODA. We expect the released benchmarks could facilitate the development of SOD and spawn more breakthroughs in this field. Datasets and codes will be available soon at: \url{https://shaunyuan22.github.io/SODA}.
2022-07-31 09:01:25 - AI in Telemedicine: An Appraisal on Deep Learning-Based Approaches to Virtual Diagnostic Solutions (VDS) - *Ozioma Collins Oguine, Kanyifeechukwu Jane Oguine* - `2208.04690v1` - [abs](http://arxiv.org/abs/2208.04690v1) - [pdf](http://arxiv.org/pdf/2208.04690v1) > Advancements in Telemedicine as an approach to healthcare delivery have heralded a new dawn in modern Medicine. Its fast-paced development in our contemporary society is credence to the advances in Artificial Intelligence and Information Technology. This paper carries out a descriptive study to broadly explore AI's implementations in healthcare delivery with a more holistic view of the usability of various Telemedical Innovations in enhancing Virtual Diagnostic Solutions (VDS). This research further explores notable developments in Deep Learning model optimizations for Virtual Diagnostic Solutions. A further research review on the prospects of Virtual Diagnostic Solutions (VDS) and foreseeable challenges was also highlighted. Conclusively, this research gives a general overview of Artificial Intelligence in Telemedicine with a central focus on Deep Learning-based approaches to Virtual Diagnostic Solutions.
2022-07-31 15:09:47 - Latent Space Unsupervised Semantic Segmentation - *Knut J. Strømmen, Jim Tørresen, Ulysse Côté-Allard* - `2207.11067v2` - [abs](http://arxiv.org/abs/2207.11067v2) - [pdf](http://arxiv.org/pdf/2207.11067v2) > The development of compact and energy-efficient wearable sensors has led to an increase in the availability of biosignals. To analyze these continuously recorded, and often multidimensional, time series at scale, being able to conduct meaningful unsupervised data segmentation is an auspicious target. A common way to achieve this is to identify change-points within the time series as the segmentation basis. However, traditional change-point detection algorithms often come with drawbacks, limiting their real-world applicability. Notably, they generally rely on the complete time series to be available and thus cannot be used for real-time applications. Another common limitation is that they poorly (or cannot) handle the segmentation of multidimensional time series. Consequently, the main contribution of this work is to propose a novel unsupervised segmentation algorithm for multidimensional time series named Latent Space Unsupervised Semantic Segmentation (LS-USS), which was designed to work easily with both online and batch data. When comparing LS-USS against other state-of-the-art change-point detection algorithms on a variety of real-world datasets, in both the offline and real-time setting, LS-USS systematically achieves on par or better performances.
2022-07-31 17:33:09 - Clover: Towards A Unified Video-Language Alignment and Fusion Model - *Jingjia Huang, Yinan Li, Jiashi Feng, Xiaoshuai Sun, Rongrong Ji* - `2207.07885v2` - [abs](http://arxiv.org/abs/2207.07885v2) - [pdf](http://arxiv.org/pdf/2207.07885v2) > Building a universal video-language model for solving various video understanding tasks (e.g., text-video retrieval, video question answering) is an open challenge to the machine learning field. Towards this goal, most recent attempts train the models, usually consisting of uni-modal and cross-modal feature encoders, with supervised or pair-wise contrastive pre-text tasks. Though offering attractive generality, the resulted models have to compromise between efficiency and performance. We argue the flaws are caused by their pre-training strategies\textemdash they cannot well align and fuse features from different modalities simultaneously. We then introduce Clover -- a Correlated Video-Language pre-training method -- towards a universal video-language model for solving multiple video understanding tasks with neither performance nor efficiency compromise. It improves cross-modal feature alignment and fusion via a novel tri-modal alignment pre-training task. Additionally, we propose to enhance the tri-modal alignment via incorporating learning from masked samples and a novel pair-wise ranking loss. It establishes new state-of-the-arts on multiple downstream tasks, including three retrieval tasks for both zero-shot and fine-tuning settings, and eight video question answering tasks. Codes and pre-trained models will be released at https://github.com/LeeYN-43/Clover.
2022-07-31 17:44:57 - The impact of Twitter on political influence on the choice of a running mate: Social Network Analysis and Semantic Analysis -- A Review - *Immaculate Wanza, Irad Kamuti, David Gichohi, Kinyua Gikunda* - `2208.00479v1` - [abs](http://arxiv.org/abs/2208.00479v1) - [pdf](http://arxiv.org/pdf/2208.00479v1) > In this new era of social media, social networks are becoming increasingly important sources of user-generated content on the internet. These kinds of information resources, which include a lot of people's feelings, opinions, feedback, and reviews, are very useful for big businesses, markets, politics, journalism, and many other fields. Politics is one of the most talked-about and popular topics on social media networks right now. Many politicians use micro-blogging services like Twitter because they have a large number of followers and supporters on those networks. Politicians, political parties, political organizations, and foundations use social media networks to communicate with citizens ahead of time. Today, social media is used by hundreds of thousands of political groups and politicians. On these social media networks, every politician and political party has millions of followers, and politicians find new and innovative ways to urge individuals to participate in politics. Furthermore, social media assists politicians in various decision-making processes by providing recommendations, such as developing policies and strategies based on previous experiences, recommending and selecting suitable candidates for a particular constituency, recommending a suitable person for a particular position in the party, and launching a political campaign based on citizen sentiments on various issues and controversies, among other things. This research is a review on the use of social network analysis (SNA) and semantic analysis (SA) on the Twitter platform to study the supporters networks of political leaders because it can help in decision-making when predicting their political futures.
2022-07-31 21:03:40 - Assessing The Performance of YOLOv5 Algorithm for Detecting Volunteer Cotton Plants in Corn Fields at Three Different Growth Stages - *Pappu Kumar Yadav, J. Alex Thomasson, Stephen W. Searcy, Robert G. Hardin, Ulisses Braga-Neto, Sorin C. Popescu, Daniel E. Martin, Roberto Rodriguez, Karem Meza, Juan Enciso, Jorge Solorzano Diaz, Tianyi Wang* - `2208.00519v1` - [abs](http://arxiv.org/abs/2208.00519v1) - [pdf](http://arxiv.org/pdf/2208.00519v1) > The boll weevil (Anthonomus grandis L.) is a serious pest that primarily feeds on cotton plants. In places like Lower Rio Grande Valley of Texas, due to sub-tropical climatic conditions, cotton plants can grow year-round and therefore the left-over seeds from the previous season during harvest can continue to grow in the middle of rotation crops like corn (Zea mays L.) and sorghum (Sorghum bicolor L.). These feral or volunteer cotton (VC) plants when reach the pinhead squaring phase (5-6 leaf stage) can act as hosts for the boll weevil pest. The Texas Boll Weevil Eradication Program (TBWEP) employs people to locate and eliminate VC plants growing by the side of roads or fields with rotation crops but the ones growing in the middle of fields remain undetected. In this paper, we demonstrate the application of computer vision (CV) algorithm based on You Only Look Once version 5 (YOLOv5) for detecting VC plants growing in the middle of corn fields at three different growth stages (V3, V6, and VT) using unmanned aircraft systems (UAS) remote sensing imagery. All the four variants of YOLOv5 (s, m, l, and x) were used and their performances were compared based on classification accuracy, mean average precision (mAP), and F1-score. It was found that YOLOv5s could detect VC plants with a maximum classification accuracy of 98% and mAP of 96.3 % at the V6 stage of corn while YOLOv5s and YOLOv5m resulted in the lowest classification accuracy of 85% and YOLOv5m and YOLOv5l had the least mAP of 86.5% at the VT stage on images of size 416 x 416 pixels. The developed CV algorithm has the potential to effectively detect and locate VC plants growing in the middle of corn fields as well as expedite the management aspects of TBWEP.

2022-08

2022-08-01 06:49:07 - Studying writer-suggestion interaction: A qualitative study to understand writer interaction with aligned/misaligned next-phrase suggestion - *Advait Bhat, Saaket Agashe, Niharika Mohile, Parth Oberoi, Ravi Jangir, Anirudha Joshi* - `2208.00636v1` - [abs](http://arxiv.org/abs/2208.00636v1) - [pdf](http://arxiv.org/pdf/2208.00636v1) > We present an exploratory qualitative study to understand how writers interact with next-phrase suggestions. While there has been some quantitative research on the effects of suggestion systems on writing, there has been little qualitative work to understand how writers interact with suggestion systems and how it affects their writing process - specifically for a non-native but English writer. We conducted a study where amateur writers were asked to write two movie reviews each, one without suggestions and one with. We found writers interact with next-phrase suggestions in various complex ways - writers are able to abstract multiple parts of the suggestions and incorporate them within their writing - even when they disagree with the suggestion as a whole. The suggestion system also had various effects on the writing processes - contributing to different aspects of the writing process in unique ways. We propose a model of writer-suggestion interaction for writing with GPT-2 for a movie review writing task, followed by ways in which the model can be used for future research, along with outlining opportunities for research and design.
2022-08-01 13:51:53 - Assessing the Early Bird Heuristic (for Predicting Project Quality) - *N. C. Shrikanth, Tim Menzies* - `2105.11082v3` - [abs](http://arxiv.org/abs/2105.11082v3) - [pdf](http://arxiv.org/pdf/2105.11082v3) > Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a case study with 240 projects, where we find that the information in those projects "clump" towards the earliest parts of the project. A quality prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this "early bird" data, we can build models very quickly and very early in the project life cycle. Moreover, using this early bird method, we have shown that a simple model (with just a few features) generalizes to hundreds of projects. Based on this experience, we doubt that prior work on generalizing quality models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are available here: https://github.com/snaraya7/early-bird
2022-08-01 13:58:56 - Inductive Biases for Deep Learning of Higher-Level Cognition - *Anirudh Goyal, Yoshua Bengio* - `2011.15091v4` - [abs](http://arxiv.org/abs/2011.15091v4) - [pdf](http://arxiv.org/pdf/2011.15091v4) > A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.
2022-08-01 14:17:51 - Computer vision-based analysis of buildings and built environments: A systematic review of current approaches - *Małgorzata B. Starzyńska, Robin Roussel, Sam Jacoby, Ali Asadipour* - `2208.00881v1` - [abs](http://arxiv.org/abs/2208.00881v1) - [pdf](http://arxiv.org/pdf/2208.00881v1) > Analysing 88 sources published from 2011 to 2021, this paper presents a first systematic review of the computer vision-based analysis of buildings and the built environments to assess its value to architectural and urban design studies. Following a multi-stage selection process, the types of algorithms and data sources used are discussed in respect to architectural applications such as a building classification, detail classification, qualitative environmental analysis, building condition survey, and building value estimation. This reveals current research gaps and trends, and highlights two main categories of research aims. First, to use or optimise computer vision methods for architectural image data, which can then help automate time-consuming, labour-intensive, or complex tasks of visual analysis. Second, to explore the methodological benefits of machine learning approaches to investigate new questions about the built environment by finding patterns and relationships between visual, statistical, and qualitative data, which can overcome limitations of conventional manual analysis. The growing body of research offers new methods to architectural and design studies, with the paper identifying future challenges and directions of research.
2022-08-01 15:05:26 - Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression - *Felix Ott, Nisha Lakshmana Raichur, David Rügamer, Tobias Feigl, Heiko Neumann, Bernd Bischl, Christopher Mutschler* - `2208.00919v1` - [abs](http://arxiv.org/abs/2208.00919v1) - [pdf](http://arxiv.org/pdf/2208.00919v1) > Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles. The goal is to estimate an accurate pose of an object when either the environment or the dynamics are known. Recent methods directly regress the pose using convolutional and spatio-temporal networks. Absolute pose regression (APR) techniques predict the absolute camera pose from an image input in a known scene. Odometry methods perform relative pose regression (RPR) that predicts the relative pose from a known object dynamic (visual or inertial inputs). The localization task can be improved by retrieving information of both data sources for a cross-modal setup, which is a challenging problem due to contradictory tasks. In this work, we conduct a benchmark to evaluate deep multimodal fusion based on PGO and attention networks. Auxiliary and Bayesian learning are integrated for the APR task. We show accuracy improvements for the RPR-aided APR task and for the RPR-RPR task for aerial vehicles and hand-held devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets, and record a novel industry dataset.
2022-08-01 19:37:33 - VacciNet: Towards a Smart Framework for Learning the Distribution Chain Optimization of Vaccines for a Pandemic - *Jayeeta Mondal, Jeet Dutta, Hrishav Bakul Barua* - `2208.01112v1` - [abs](http://arxiv.org/abs/2208.01112v1) - [pdf](http://arxiv.org/pdf/2208.01112v1) > Vaccinations against viruses have always been the need of the hour since long past. However, it is hard to efficiently distribute the vaccines (on time) to all the corners of a country, especially during a pandemic. Considering the vastness of the population, diversified communities, and demands of a smart society, it is an important task to optimize the vaccine distribution strategy in any country/state effectively. Although there is a profusion of data (Big Data) from various vaccine administration sites that can be mined to gain valuable insights about mass vaccination drives, very few attempts has been made towards revolutionizing the traditional mass vaccination campaigns to mitigate the socio-economic crises of pandemic afflicted countries. In this paper, we bridge this gap in studies and experimentation. We collect daily vaccination data which is publicly available and carefully analyze it to generate meaning-full insights and predictions. We put forward a novel framework leveraging Supervised Learning and Reinforcement Learning (RL) which we call VacciNet, that is capable of learning to predict the demand of vaccination in a state of a country as well as suggest optimal vaccine allocation in the state for minimum cost of procurement and supply. At the present, our framework is trained and tested with vaccination data of the USA.
2022-08-02 03:17:20 - Generative Adversarial Learning for Intelligent Trust Management in 6G Wireless Networks - *Liu Yang, Yun Li, Simon X. Yang, Yinzhi Lu, Tan Guo, Keping Yu* - `2208.01221v1` - [abs](http://arxiv.org/abs/2208.01221v1) - [pdf](http://arxiv.org/pdf/2208.01221v1) > Emerging six generation (6G) is the integration of heterogeneous wireless networks, which can seamlessly support anywhere and anytime networking. But high Quality-of-Trust should be offered by 6G to meet mobile user expectations. Artificial intelligence (AI) is considered as one of the most important components in 6G. Then AI-based trust management is a promising paradigm to provide trusted and reliable services. In this article, a generative adversarial learning-enabled trust management method is presented for 6G wireless networks. Some typical AI-based trust management schemes are first reviewed, and then a potential heterogeneous and intelligent 6G architecture is introduced. Next, the integration of AI and trust management is developed to optimize the intelligence and security. Finally, the presented AI-based trust management method is applied to secure clustering to achieve reliable and real-time communications. Simulation results have demonstrated its excellent performance in guaranteeing network security and service quality.
2022-08-02 05:25:35 - A Robust Morphological Approach for Semantic Segmentation of Very High Resolution Images - *Siddharth Saravanan, Aditya Challa, Sravan Danda* - `2208.01254v1` - [abs](http://arxiv.org/abs/2208.01254v1) - [pdf](http://arxiv.org/pdf/2208.01254v1) > State-of-the-art methods for semantic segmentation of images involve computationally intensive neural network architectures. Most of these methods are not adaptable to high-resolution image segmentation due to memory and other computational issues. Typical approaches in literature involve design of neural network architectures that can fuse global information from low-resolution images and local information from the high-resolution counterparts. However, architectures designed for processing high resolution images are unnecessarily complex and involve a lot of hyper parameters that can be difficult to tune. Also, most of these architectures require ground truth annotations of the high resolution images to train, which can be hard to obtain. In this article, we develop a robust pipeline based on mathematical morphological (MM) operators that can seamlessly extend any existing semantic segmentation algorithm to high resolution images. Our method does not require the ground truth annotations of the high resolution images. It is based on efficiently utilizing information from the low-resolution counterparts, and gradient information on the high-resolution images. We obtain high quality seeds from the inferred labels on low-resolution images using traditional morphological operators and propagate seed labels using a random walker to refine the semantic labels at the boundaries. We show that the semantic segmentation results obtained by our method beat the existing state-of-the-art algorithms on high-resolution images. We empirically prove the robustness of our approach to the hyper parameters used in our pipeline. Further, we characterize some necessary conditions under which our pipeline is applicable and provide an in-depth analysis of the proposed approach.
2022-08-02 06:57:06 - A Survey of Natural Language Generation - *Chenhe Dong, Yinghui Li, Haifan Gong, Miaoxin Chen, Junxin Li, Ying Shen, Min Yang* - `2112.11739v2` - [abs](http://arxiv.org/abs/2112.11739v2) - [pdf](http://arxiv.org/pdf/2112.11739v2) > This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep learning methods, as well as new applications of NLG technology. This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various NLG tasks and datasets, and draw attention to the challenges in NLG evaluation, focusing on different evaluation methods and their relationships; (c) highlight some future emphasis and relatively recent research issues that arise due to the increasing synergy between NLG and other artificial intelligence areas, such as computer vision, text and computational creativity.
2022-08-02 07:23:21 - An Introduction to Multi-Agent Reinforcement Learning and Review of its Application to Autonomous Mobility - *Lukas M. Schmidt, Johanna Brosig, Axel Plinge, Bjoern M. Eskofier, Christopher Mutschler* - `2203.07676v2` - [abs](http://arxiv.org/abs/2203.07676v2) - [pdf](http://arxiv.org/pdf/2203.07676v2) > Many scenarios in mobility and traffic involve multiple different agents that need to cooperate to find a joint solution. Recent advances in behavioral planning use Reinforcement Learning to find effective and performant behavior strategies. However, as autonomous vehicles and vehicle-to-X communications become more mature, solutions that only utilize single, independent agents leave potential performance gains on the road. Multi-Agent Reinforcement Learning (MARL) is a research field that aims to find optimal solutions for multiple agents that interact with each other. This work aims to give an overview of the field to researchers in autonomous mobility. We first explain MARL and introduce important concepts. Then, we discuss the central paradigms that underlie MARL algorithms, and give an overview of state-of-the-art methods and ideas in each paradigm. With this background, we survey applications of MARL in autonomous mobility scenarios and give an overview of existing scenarios and implementations.
2022-08-02 08:43:29 - Improving Few-Shot Learning through Multi-task Representation Learning Theory - *Quentin Bouniot, Ievgen Redko, Romaric Audigier, Angélique Loesch, Amaury Habrard* - `2010.01992v3` - [abs](http://arxiv.org/abs/2010.01992v3) - [pdf](http://arxiv.org/pdf/2010.01992v3) > In this paper, we consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We start by reviewing recent advances in MTR theory and show that they can provide novel insights for popular meta-learning algorithms when analyzed within this framework. In particular, we highlight a fundamental difference between gradient-based and metric-based algorithms in practice and put forward a theoretical analysis to explain it. Finally, we use the derived insights to improve the performance of meta-learning methods via a new spectral-based regularization term and confirm its efficiency through experimental studies on few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification.
2022-08-02 15:31:10 - Texture features in medical image analysis: a survey - *Faeze Kiani* - `2208.02046v1` - [abs](http://arxiv.org/abs/2208.02046v1) - [pdf](http://arxiv.org/pdf/2208.02046v1) > The texture is defined as spatial structure of the intensities of the pixels in an image that is repeated periodically in the whole image or regions, and makes the concept of the image. Texture, color and shape are three main components which are used by human visual system to recognize image contents. In this paper, first of all, efficient and updated texture analysis operators are survived with details. Next, some state-of-the-art methods are survived that use texture analysis in medical applications and disease diagnosis. Finally, different approaches are compared in terms of accuracy, dataset, application, etc. Results demonstrate that texture features separately or in joint of different feature sets such as deep, color or shape features provide high accuracy in medical image classification.
2022-08-02 17:01:06 - 20 years of network community detection - *Santo Fortunato, M. E. J. Newman* - `2208.00111v2` - [abs](http://arxiv.org/abs/2208.00111v2) - [pdf](http://arxiv.org/pdf/2208.00111v2) > A fundamental technical challenge in the analysis of network data is the automated discovery of communities - groups of nodes that are strongly connected or that share similar features or roles. In this commentary we review progress in the field over the last 20 years.
2022-08-02 18:05:26 - Diagnosis of Paratuberculosis in Histopathological Images Based on Explainable Artificial Intelligence and Deep Learning - *Tuncay Yiğit, Nilgün Şengöz, Özlem Özmen, Jude Hemanth, Ali Hakan Işık* - `2208.01674v1` - [abs](http://arxiv.org/abs/2208.01674v1) - [pdf](http://arxiv.org/pdf/2208.01674v1) > Artificial intelligence holds great promise in medical imaging, especially histopathological imaging. However, artificial intelligence algorithms cannot fully explain the thought processes during decision-making. This situation has brought the problem of explainability, i.e., the black box problem, of artificial intelligence applications to the agenda: an algorithm simply responds without stating the reasons for the given images. To overcome the problem and improve the explainability, explainable artificial intelligence (XAI) has come to the fore, and piqued the interest of many researchers. Against this backdrop, this study examines a new and original dataset using the deep learning algorithm, and visualizes the output with gradient-weighted class activation mapping (Grad-CAM), one of the XAI applications. Afterwards, a detailed questionnaire survey was conducted with the pathologists on these images. Both the decision-making processes and the explanations were verified, and the accuracy of the output was tested. The research results greatly help pathologists in the diagnosis of paratuberculosis.
2022-08-02 18:44:06 - Recognizing and Extracting Cybersecurtity-relevant Entities from Text - *Casey Hanks, Michael Maiden, Priyanka Ranade, Tim Finin, Anupam Joshi* - `2208.01693v1` - [abs](http://arxiv.org/abs/2208.01693v1) - [pdf](http://arxiv.org/pdf/2208.01693v1) > Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text.
2022-08-02 19:51:43 - No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling - *Marília Costa Rosendo Silva, Felipe Alves Siqueira, João Pedro Mantovani Tarrega, João Vitor Pataca Beinotti, Augusto Sousa Nunes, Miguel de Mattos Gardini, Vinícius Adolfo Pereira da Silva, Nádia Félix Felipe da Silva, André Carlos Ponce de Leon Ferreira de Carvalho* - `2208.01712v1` - [abs](http://arxiv.org/abs/2208.01712v1) - [pdf](http://arxiv.org/pdf/2208.01712v1) > Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.
2022-08-03 02:11:42 - A comprehensive survey on computer-aided diagnostic systems in diabetic retinopathy screening - *Meysam Tavakoli, Patrick Kelley* - `2208.01810v1` - [abs](http://arxiv.org/abs/2208.01810v1) - [pdf](http://arxiv.org/pdf/2208.01810v1) > Diabetes Mellitus (DM) can lead to significant microvasculature disruptions that eventually causes diabetic retinopathy (DR), or complications in the eye due to diabetes. If left unchecked, this disease can increase over time and eventually cause complete vision loss. The general method to detect such optical developments is through examining the vessels, optic nerve head, microaneurysms, haemorrhage, exudates, etc. from retinal images. Ultimately this is limited by the number of experienced ophthalmologists and the vastly growing number of DM cases. To enable earlier and efficient DR diagnosis, the field of ophthalmology requires robust computer aided diagnosis (CAD) systems. Our review is intended for anyone, from student to established researcher, who wants to understand what can be accomplished with CAD systems and their algorithms to modeling and where the field of retinal image processing in computer vision and pattern recognition is headed. For someone just getting started, we place a special emphasis on the logic, strengths and shortcomings of different databases and algorithms frameworks with a focus on very recent approaches.
2022-08-03 03:11:34 - Medical image registration using unsupervised deep neural network: A scoping literature review - *Samaneh Abbasi, Meysam Tavakoli, Hamid Reza Boveiri, Mohammad Amin Mosleh Shirazi, Raouf Khayami, Hedieh Khorasani, Reza Javidan, Alireza Mehdizadeh* - `2208.01825v1` - [abs](http://arxiv.org/abs/2208.01825v1) - [pdf](http://arxiv.org/pdf/2208.01825v1) > In medicine, image registration is vital in image-guided interventions and other clinical applications. However, it is a difficult subject to be addressed which by the advent of machine learning, there have been considerable progress in algorithmic performance has recently been achieved for medical image registration in this area. The implementation of deep neural networks provides an opportunity for some medical applications such as conducting image registration in less time with high accuracy, playing a key role in countering tumors during the operation. The current study presents a comprehensive scoping review on the state-of-the-art literature of medical image registration studies based on unsupervised deep neural networks is conducted, encompassing all the related studies published in this field to this date. Here, we have tried to summarize the latest developments and applications of unsupervised deep learning-based registration methods in the medical field. Fundamental and main concepts, techniques, statistical analysis from different viewpoints, novelties, and future directions are elaborately discussed and conveyed in the current comprehensive scoping review. Besides, this review hopes to help those active readers, who are riveted by this field, achieve deep insight into this exciting field.
2022-08-03 04:42:14 - Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey - *Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley* - `2009.10301v2` - [abs](http://arxiv.org/abs/2009.10301v2) - [pdf](http://arxiv.org/pdf/2009.10301v2) > Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods.
2022-08-03 04:46:44 - Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey - *Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley* - `2106.15379v2` - [abs](http://arxiv.org/abs/2106.15379v2) - [pdf](http://arxiv.org/pdf/2106.15379v2) > This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.
2022-08-03 06:07:05 - To Be Announced - *Hans van Ditmarsch* - `2004.05802v3` - [abs](http://arxiv.org/abs/2004.05802v3) - [pdf](http://arxiv.org/pdf/2004.05802v3) > In this survey we review dynamic epistemic logics with modalities for quantification over information change. Of such logics we present complete axiomatizations, focussing on axioms involving the interaction between knowledge and such quantifiers, we report on their relative expressivity, on decidability and on the complexity of model checking and satisfiability, and on applications. We focus on open problems and new directions for research.
2022-08-03 06:46:42 - Adversarial Attacks on ASR Systems: An Overview - *Xiao Zhang, Hao Tan, Xuan Huang, Denghui Zhang, Keke Tang, Zhaoquan Gu* - `2208.02250v1` - [abs](http://arxiv.org/abs/2208.02250v1) - [pdf](http://arxiv.org/pdf/2208.02250v1) > With the development of hardware and algorithms, ASR(Automatic Speech Recognition) systems evolve a lot. As The models get simpler, the difficulty of development and deployment become easier, ASR systems are getting closer to our life. On the one hand, we often use APPs or APIs of ASR to generate subtitles and record meetings. On the other hand, smart speaker and self-driving car rely on ASR systems to control AIoT devices. In past few years, there are a lot of works on adversarial examples attacks against ASR systems. By adding a small perturbation to the waveforms, the recognition results make a big difference. In this paper, we describe the development of ASR system, different assumptions of attacks, and how to evaluate these attacks. Next, we introduce the current works on adversarial examples attacks from two attack assumptions: white-box attack and black-box attack. Different from other surveys, we pay more attention to which layer they perturb waveforms in ASR system, the relationship between these attacks, and their implementation methods. We focus on the effect of their works.
2022-08-03 07:43:31 - Evaluating and improving social awareness of energy communities through semantic network analysis of online news - *C. Piselli, A. Fronzetti Colladon, L. Segneri, A. L. Pisello* - `2208.01892v1` - [abs](http://arxiv.org/abs/2208.01892v1) - [pdf](http://arxiv.org/pdf/2208.01892v1) > The implementation of energy communities represents a cross-disciplinary phenomenon that has the potential to support the energy transition while fostering citizens' participation throughout the energy system and their exploitation of renewables. An important role is played by online information sources in engaging people in this process and increasing their awareness of associated benefits. In this view, this work analyses online news data on energy communities to understand people's awareness and the media importance of this topic. We use the Semantic Brand Score (SBS) indicator as an innovative measure of semantic importance, combining social network analysis and text mining methods. Results show different importance trends for energy communities and other energy and society-related topics, also allowing the identification of their connections. Our approach gives evidence to information gaps and possible actions that could be taken to promote a low-carbon energy transition.
2022-08-03 08:11:08 - Neural Dynamic Movement Primitives -- a survey - *Jože M Rožanec, Bojan Nemec* - `2208.01903v1` - [abs](http://arxiv.org/abs/2208.01903v1) - [pdf](http://arxiv.org/pdf/2208.01903v1) > One of the most important challenges in robotics is producing accurate trajectories and controlling their dynamic parameters so that the robots can perform different tasks. The ability to provide such motion control is closely related to how such movements are encoded. Advances on deep learning have had a strong repercussion in the development of novel approaches for Dynamic Movement Primitives. In this work, we survey scientific literature related to Neural Dynamic Movement Primitives, to complement existing surveys on Dynamic Movement Primitives.
2022-08-03 10:23:58 - Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis - *Changhong Fu, Kunhan Lu, Guangze Zheng, Junjie Ye, Ziang Cao, Bowen Li, Geng Lu* - `2205.04281v2` - [abs](http://arxiv.org/abs/2205.04281v2) - [pdf](http://arxiv.org/pdf/2205.04281v2) > Unmanned aerial vehicle (UAV)-based visual object tracking has enabled a wide range of applications and attracted increasing attention in the field of intelligent transportation systems because of its versatility and effectiveness. As an emerging force in the revolutionary trend of deep learning, Siamese networks shine in UAV-based object tracking with their promising balance of accuracy, robustness, and speed. Thanks to the development of embedded processors and the gradual optimization of deep neural networks, Siamese trackers receive extensive research and realize preliminary combinations with UAVs. However, due to the UAV's limited onboard computational resources and the complex real-world circumstances, aerial tracking with Siamese networks still faces severe obstacles in many aspects. To further explore the deployment of Siamese networks in UAV-based tracking, this work presents a comprehensive review of leading-edge Siamese trackers, along with an exhaustive UAV-specific analysis based on the evaluation using a typical UAV onboard processor. Then, the onboard tests are conducted to validate the feasibility and efficacy of representative Siamese trackers in real-world UAV deployment. Furthermore, to better promote the development of the tracking community, this work analyzes the limitations of existing Siamese trackers and conducts additional experiments represented by low-illumination evaluations. In the end, prospects for the development of Siamese tracking for UAV-based intelligent transportation systems are deeply discussed. The unified framework of leading-edge Siamese trackers, i.e., code library, and the results of their experimental evaluations are available at https://github.com/vision4robotics/SiameseTracking4UAV .
2022-08-03 10:46:12 - Generalized Out-of-Distribution Detection: A Survey - *Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu* - `2110.11334v2` - [abs](http://arxiv.org/abs/2110.11334v2) - [pdf](http://arxiv.org/pdf/2110.11334v2) > Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. We then review each of these five areas by summarizing their recent technical developments, with a special focus on OOD detection methodologies. We conclude this survey with open challenges and potential research directions.
2022-08-03 11:41:18 - AUC Maximization in the Era of Big Data and AI: A Survey - *Tianbao Yang, Yiming Ying* - `2203.15046v3` - [abs](http://arxiv.org/abs/2203.15046v3) - [pdf](http://arxiv.org/pdf/2203.15046v3) > Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss remaining and emerging issues for deep AUC maximization, and provide suggestions on topics for future work.
2022-08-03 16:13:24 - Multimodal sensor fusion in the latent representation space - *Robert J. Piechocki, Xiaoyang Wang, Mohammud J. Bocus* - `2208.02183v1` - [abs](http://arxiv.org/abs/2208.02183v1) - [pdf](http://arxiv.org/pdf/2208.02183v1) > A new method for multimodal sensor fusion is introduced. The technique relies on a two-stage process. In the first stage, a multimodal generative model is constructed from unlabelled training data. In the second stage, the generative model serves as a reconstruction prior and the search manifold for the sensor fusion tasks. The method also handles cases where observations are accessed only via subsampling i.e. compressed sensing. We demonstrate the effectiveness and excellent performance on a range of multimodal fusion experiments such as multisensory classification, denoising, and recovery from subsampled observations.
2022-08-03 16:41:54 - Action Spotting using Dense Detection Anchors Revisited: Submission to the SoccerNet Challenge 2022 - *João V. B. Soares, Avijit Shah* - `2206.07846v2` - [abs](http://arxiv.org/abs/2206.07846v2) - [pdf](http://arxiv.org/pdf/2206.07846v2) > This brief technical report describes our submission to the Action Spotting SoccerNet Challenge 2022. The challenge was part of the CVPR 2022 ActivityNet Workshop. Our submission was based on a recently proposed method which focuses on increasing temporal precision via a densely sampled set of detection anchors. Due to its emphasis on temporal precision, this approach had shown significant improvements in the tight average-mAP metric. Tight average-mAP was used as the evaluation criterion for the challenge, and is defined using small temporal evaluation tolerances, thus being more sensitive to small temporal errors. In order to further improve results, here we introduce small changes in the pre- and post-processing steps, and also combine different input feature types via late fusion. These changes brought improvements that helped us achieve the first place in the challenge and also led to a new state-of-the-art on SoccerNet's test set when using the dataset's standard experimental protocol. This report briefly reviews the action spotting method based on dense detection anchors, then focuses on the modifications introduced for the challenge. We also describe the experimental protocols and training procedures we used, and finally present our results.
2022-08-04 10:13:05 - On the use of Artificial Neural Networks in Topology Optimisation - *Rebekka V. Woldseth, Niels Aage, J. Andreas Bærentzen, Ole Sigmund* - `2208.02563v1` - [abs](http://arxiv.org/abs/2208.02563v1) - [pdf](http://arxiv.org/pdf/2208.02563v1) > The question of how methods from the field of artificial intelligence can help improve the conventional frameworks for topology optimisation has received increasing attention over the last few years. Motivated by the capabilities of neural networks in image analysis, different model-variations aimed at obtaining iteration-free topology optimisation have been proposed with varying success. Other works focused on speed-up through replacing expensive optimisers and state solvers, or reducing the design-space have been attempted, but have not yet received the same attention. The portfolio of articles presenting different applications has as such become extensive, but few real breakthroughs have yet been celebrated. An overall trend in the literature is the strong faith in the "magic" of artificial intelligence and thus misunderstandings about the capabilities of such methods. The aim of this article is therefore to present a critical review of the current state of research in this field. To this end, an overview of the different model-applications is presented, and efforts are made to identify reasons for the overall lack of convincing success. A thorough analysis identifies and differentiates between problematic and promising aspects of existing models. The resulting findings are used to detail recommendations believed to encourage avenues of potential scientific progress for further research within the field.
2022-08-04 13:32:56 - ATP: A holistic attention integrated approach to enhance ABSA - *Ashish Kumar, Vasundhra Dahiya, Aditi Sharan* - `2208.02653v1` - [abs](http://arxiv.org/abs/2208.02653v1) - [pdf](http://arxiv.org/pdf/2208.02653v1) > Aspect based sentiment analysis (ABSA) deals with the identification of the sentiment polarity of a review sentence towards a given aspect. Deep Learning sequential models like RNN, LSTM, and GRU are current state-of-the-art methods for inferring the sentiment polarity. These methods work well to capture the contextual relationship between the words of a review sentence. However, these methods are insignificant in capturing long-term dependencies. Attention mechanism plays a significant role by focusing only on the most crucial part of the sentence. In the case of ABSA, aspect position plays a vital role. Words near to aspect contribute more while determining the sentiment towards the aspect. Therefore, we propose a method that captures the position based information using dependency parsing tree and helps attention mechanism. Using this type of position information over a simple word-distance-based position enhances the deep learning model's performance. We performed the experiments on SemEval'14 dataset to demonstrate the effect of dependency parsing relation-based attention for ABSA.
2022-08-04 15:39:00 - Understanding the QuickXPlain Algorithm: Simple Explanation and Formal Proof - *Patrick Rodler* - `2001.01835v3` - [abs](http://arxiv.org/abs/2001.01835v3) - [pdf](http://arxiv.org/pdf/2001.01835v3) > In his seminal paper of 2004, Ulrich Junker proposed the QuickXPlain algorithm, which provides a divide-and-conquer computation strategy to find within a given set an irreducible subset with a particular (monotone) property. Beside its original application in the domain of constraint satisfaction problems, the algorithm has since then found widespread adoption in areas as different as model-based diagnosis, recommender systems, verification, or the Semantic Web. This popularity is due to the frequent occurrence of the problem of finding irreducible subsets on the one hand, and to QuickXPlain's general applicability and favorable computational complexity on the other hand. However, although (we regularly experience) people are having a hard time understanding QuickXPlain and seeing why it works correctly, a proof of correctness of the algorithm has never been published. This is what we account for in this work, by explaining QuickXPlain in a novel tried and tested way and by presenting an intelligible formal proof of it. Apart from showing the correctness of the algorithm and excluding the later detection of errors (proof and trust effect), the added value of the availability of a formal proof is, e.g., (i) that the workings of the algorithm often become completely clear only after studying, verifying and comprehending the proof (didactic effect), (ii) the shown proof methodology can be used as a guidance for proving other recursive algorithms (transfer effect), and (iii) the possibility of providing "gapless" correctness proofs of systems that rely on (results computed by) QuickXPlain, such as numerous model-based debuggers (completeness effect).
2022-08-04 16:58:31 - Absolute Triangulation Algorithms for Space Exploration - *Sebastien Henry, John A. Christian* - `2205.12197v2` - [abs](http://arxiv.org/abs/2205.12197v2) - [pdf](http://arxiv.org/pdf/2205.12197v2) > Images are an important source of information for spacecraft navigation and for three-dimensional reconstruction of observed space objects. Both of these applications take the form of a triangulation problem when the camera has a known attitude and the measurements extracted from the image are line of sight (LOS) directions. This work provides a comprehensive review of the history and theoretical foundations of triangulation. A variety of classical triangulation algorithms are reviewed, including a number of suboptimal linear methods (many LOS measurements) and the optimal method of Hartley and Sturm (only two LOS measurements). It is shown that the optimal many-measurement case may be solved without iteration as a linear system using the new Linear Optimal Sine Triangulation (LOST) method. Both LOST and the polynomial method of Hartley and Sturm provide the same result in the case of only two measurements. The various triangulation algorithms are assessed with a few numerical examples, including planetary terrain relative navigation, angles-only optical navigation at Uranus, 3-D reconstruction of Notre-Dame de Paris, and angles-only relative navigation.
2022-08-04 17:28:19 - Generalization Analysis of Message Passing Neural Networks on Large Random Graphs - *Sohir Maskey, Ron Levie, Yunseok Lee, Gitta Kutyniok* - `2202.00645v6` - [abs](http://arxiv.org/abs/2202.00645v6) - [pdf](http://arxiv.org/pdf/2202.00645v6) > Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph-structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization error of MPNNs in graph classification and regression. We assume that graphs of different classes are sampled from different random graph models. We show that, when training a MPNN on a dataset sampled from such a distribution, the generalization gap increases in the complexity of the MPNN, and decreases, not only with respect to the number of training samples, but also with the average number of nodes in the graphs. This shows how a MPNN with high complexity can generalize from a small dataset of graphs, as long as the graphs are large. The generalization bound is derived from a uniform convergence result, that shows that any MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.
2022-08-04 17:53:17 - Vision-Centric BEV Perception: A Survey - *Yuexin Ma, Tai Wang, Xuyang Bai, Huitong Yang, Yuenan Hou, Yaming Wang, Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu* - `2208.02797v1` - [abs](http://arxiv.org/abs/2208.02797v1) - [pdf](http://arxiv.org/pdf/2208.02797v1) > Vision-centric BEV perception has recently received increased attention from both industry and academia due to its inherent merits, including presenting a natural representation of the world and being fusion-friendly. With the rapid development of deep learning, numerous methods have been proposed to address the vision-centric BEV perception. However, there is no recent survey for this novel and growing research field. To stimulate its future research, this paper presents a comprehensive survey of recent progress of vision-centric BEV perception and its extensions. It collects and organizes the recent knowledge, and gives a systematic review and summary of commonly used algorithms. It also provides in-depth analyses and comparative results on several BEV perception tasks, facilitating the comparisons of future works and inspiring future research directions. Moreover, empirical implementation details are also discussed and shown to benefit the development of related algorithms.
2022-08-05 12:26:54 - Motivating explanations in Bayesian networks using MAP-independence - *Johan Kwisthout* - `2208.03121v1` - [abs](http://arxiv.org/abs/2208.03121v1) - [pdf](http://arxiv.org/pdf/2208.03121v1) > In decision support systems the motivation and justification of the system's diagnosis or classification is crucial for the acceptance of the system by the human user. In Bayesian networks a diagnosis or classification is typically formalized as the computation of the most probable joint value assignment to the hypothesis variables, given the observed values of the evidence variables (generally known as the MAP problem). While solving the MAP problem gives the most probable explanation of the evidence, the computation is a black box as far as the human user is concerned and it does not give additional insights that allow the user to appreciate and accept the decision. For example, a user might want to know to whether an unobserved variable could potentially (upon observation) impact the explanation, or whether it is irrelevant in this aspect. In this paper we introduce a new concept, MAP- independence, which tries to capture this notion of relevance, and explore its role towards a potential justification of an inference to the best explanation. We formalize several computational problems based on this concept and assess their computational complexity.
2022-08-05 14:35:03 - Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey - *Xiaoyu Shen, Svitlana Vakulenko, Marco del Tredici, Gianni Barlacchi, Bill Byrne, Adrià de Gispert* - `2208.03197v1` - [abs](http://arxiv.org/abs/2208.03197v1) - [pdf](http://arxiv.org/pdf/2208.03197v1) > Dense retrieval (DR) approaches based on powerful pre-trained language models (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems. However, they require large amounts of manual annotations to perform competitively, which is infeasible to scale. To address this, a growing body of research works have recently focused on improving DR performance under low-resource scenarios. These works differ in what resources they require for training and employ a diverse set of techniques. Understanding such differences is crucial for choosing the right technique under a specific low-resource scenario. To facilitate this understanding, we provide a thorough structured overview of mainstream techniques for low-resource DR. Based on their required resources, we divide the techniques into three main categories: (1) only documents are needed; (2) documents and questions are needed; and (3) documents and question-answer pairs are needed. For every technique, we introduce its general-form algorithm, highlight the open issues and pros and cons. Promising directions are outlined for future research.
2022-08-05 14:38:44 - A Survey on Sentence Embedding Models Performance for Patent Analysis - *Hamid Bekamiri, Daniel S. Hain, Roman Jurowetzki* - `2206.02690v3` - [abs](http://arxiv.org/abs/2206.02690v3) - [pdf](http://arxiv.org/pdf/2206.02690v3) > Patent data is an important source of knowledge for innovation research, while the technological similarity between pairs of patents is a key enabling indicator for patent analysis. Recently researchers have been using patent vector space models based on different NLP embeddings models to calculate the technological similarity between pairs of patents to help better understand innovations, patent landscaping, technology mapping, and patent quality evaluation. More often than not, Text Embedding is a vital precursor to patent analysis tasks. A pertinent question then arises: How should we measure and evaluate the accuracy of these embeddings? To the best of our knowledge, there is no comprehensive survey that builds a clear delineation of embedding models' performance for calculating patent similarity indicators. Therefore, in this study, we provide an overview of the accuracy of these algorithms based on patent classification performance and propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. In a detailed discussion, we report the performance of the top 3 algorithms at section, class, and subclass levels. The results based on the first claim of patents show that PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level. According to the first results, the performance of the models in different classes varies, which shows researchers in patent analysis can utilize the results of this study to choose the best proper model based on the specific section of patent data they used.
2022-08-05 17:13:11 - Matching Papers and Reviewers at Large Conferences - *Kevin Leyton-Brown, Mausam, Yatin Nandwani, Hedayat Zarkoob, Chris Cameron, Neil Newman, Dinesh Raghu* - `2202.12273v4` - [abs](http://arxiv.org/abs/2202.12273v4) - [pdf](http://arxiv.org/pdf/2202.12273v4) > Peer-reviewed conferences, the main publication venues in CS, rely critically on matching highly qualified reviewers for each paper. Because of the growing scale of these conferences, the tight timelines on which they operate, and a recent surge in explicitly dishonest behavior, there is now no alternative to performing this matching in an automated way. This paper studies a novel reviewer-paper matching approach that was recently deployed in the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), and has since been adopted (wholly or partially) by other conferences including ICML 2022, AAAI 2022, and IJCAI 2022. This approach has three main elements: (1) collecting and processing input data to identify problematic matches and generate reviewer-paper scores; (2) formulating and solving an optimization problem to find good reviewer-paper matchings; and (3) a two-phase reviewing process that shifts reviewing resources away from papers likely to be rejected and towards papers closer to the decision boundary. This paper also describes an evaluation of these innovations based on an extensive post-hoc analysis on real data -- including a comparison with the matching algorithm used in AAAI's previous (2020) iteration -- and supplements this with additional numerical experimentation.
2022-08-05 20:11:18 - A Survey on Visual Map Localization Using LiDARs and Cameras - *Elhousni Mahdi, Huang Xinming* - `2208.03376v1` - [abs](http://arxiv.org/abs/2208.03376v1) - [pdf](http://arxiv.org/pdf/2208.03376v1) > As the autonomous driving industry is slowly maturing, visual map localization is quickly becoming the standard approach to localize cars as accurately as possible. Owing to the rich data returned by visual sensors such as cameras or LiDARs, researchers are able to build different types of maps with various levels of details, and use them to achieve high levels of vehicle localization accuracy and stability in urban environments. Contrary to the popular SLAM approaches, visual map localization relies on pre-built maps, and is focused solely on improving the localization accuracy by avoiding error accumulation or drift. We define visual map localization as a two-stage process. At the stage of place recognition, the initial position of the vehicle in the map is determined by comparing the visual sensor output with a set of geo-tagged map regions of interest. Subsequently, at the stage of map metric localization, the vehicle is tracked while it moves across the map by continuously aligning the visual sensors' output with the current area of the map that is being traversed. In this paper, we survey, discuss and compare the latest methods for LiDAR based, camera based and cross-modal visual map localization for both stages, in an effort to highlight the strength and weakness of each approach.
2022-08-05 23:20:37 - Slice-level Detection of Intracranial Hemorrhage on CT Using Deep Descriptors of Adjacent Slices - *Dat T. Ngo, Hieu H. Pham, Thao T. B. Nguyen, Hieu T. Nguyen, Dung B. Nguyen, Ha Q. Nguyen* - `2208.03403v1` - [abs](http://arxiv.org/abs/2208.03403v1) - [pdf](http://arxiv.org/pdf/2208.03403v1) > The rapid development in representation learning techniques and the availability of large-scale medical imaging data have to a rapid increase in the use of machine learning in the 3D medical image analysis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imaging community to assist clinicians and medical experts in disease diagnosis. However, training deep neural networks such as D-CNN on high-resolution 3D volumes of Computed Tomography (CT) scans for diagnostic tasks poses formidable computational challenges. This raises the need of developing deep learning-based approaches that are robust in learning representations in 2D images, instead 3D scans. In this paper, we propose a new strategy to train \emph{slice-level} classifiers on CT scans based on the descriptors of the adjacent slices along the axis. In particular, each of which is extracted through a convolutional neural network (CNN). This method is applicable to CT datasets with per-slice labels such as the RSNA Intracranial Hemorrhage (ICH) dataset, which aims to predict the presence of ICH and classify it into 5 different sub-types. We obtain a single model in the top 4\% best-performing solutions of the RSNA ICH challenge, where model ensembles are allowed. Experiments also show that the proposed method significantly outperforms the baseline model on CQ500. The proposed method is general and can be applied for other 3D medical diagnosis tasks such as MRI imaging. To encourage new advances in the field, we will make our codes and pre-trained model available upon acceptance of the paper.
2022-08-06 03:54:53 - Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey - *Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley* - `2106.02154v2` - [abs](http://arxiv.org/abs/2106.02154v2) - [pdf](http://arxiv.org/pdf/2106.02154v2) > This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace of data. Different optimization variants of Laplacian eigenmap and its out-of-sample extension are explained. Thereafter, we introduce the locality preserving projection and its kernel variant as linear special cases of Laplacian eigenmap. Versions of graph embedding are then explained which are generalized versions of Laplacian eigenmap and locality preserving projection. Finally, diffusion map is introduced which is a method based on Laplacian of data and random walks on the data graph.
2022-08-06 12:06:35 - A review of Deep learning Techniques for COVID-19 identification on Chest CT images - *Briskline Kiruba S, Petchiammal A, D. Murugan* - `2208.00032v2` - [abs](http://arxiv.org/abs/2208.00032v2) - [pdf](http://arxiv.org/pdf/2208.00032v2) > The current COVID-19 pandemic is a serious threat to humanity that directly affects the lungs. Automatic identification of COVID-19 is a challenge for health care officials. The standard gold method for diagnosing COVID-19 is Reverse Transcription Polymerase Chain Reaction (RT-PCR) to collect swabs from affected people. Some limitations encountered while collecting swabs are related to accuracy and longtime duration. Chest CT (Computed Tomography) is another test method that helps healthcare providers quickly identify the infected lung areas. It was used as a supporting tool for identifying COVID-19 in an earlier stage. With the help of deep learning, the CT imaging characteristics of COVID-19. Researchers have proven it to be highly effective for COVID-19 CT image classification. In this study, we review the recent deep learning techniques that can use to detect the COVID-19 disease. Relevant studies were collected by various databases such as Web of Science, Google Scholar, and PubMed. Finally, we compare the results of different deep learning models, and CT image analysis is discussed.
2022-08-06 13:13:23 - Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning - *YuanFu Yang, Min Sun* - `2208.03514v1` - [abs](http://arxiv.org/abs/2208.03514v1) - [pdf](http://arxiv.org/pdf/2208.03514v1) > With the rapid development of artificial intelligence and autonomous driving technology, the demand for semiconductors is projected to rise substantially. However, the massive expansion of semiconductor manufacturing and the development of new technology will bring many defect wafers. If these defect wafers have not been correctly inspected, the ineffective semiconductor processing on these defect wafers will cause additional impact to our environment, such as excessive carbon dioxide emission and energy consumption. In this paper, we utilize the information processing advantages of quantum computing to promote the defect learning defect review (DLDR). We propose a classical-quantum hybrid algorithm for deep learning on near-term quantum processors. By tuning parameters implemented on it, quantum circuit driven by our framework learns a given DLDR task, include of wafer defect map classification, defect pattern classification, and hotspot detection. In addition, we explore parametrized quantum circuits with different expressibility and entangling capacities. These results can be used to build a future roadmap to develop circuit-based quantum deep learning for semiconductor defect detection.
2022-08-06 20:19:08 - Revisiting Gaussian Neurons for Online Clustering with Unknown Number of Clusters - *Ole Christian Eidheim* - `2205.00920v2` - [abs](http://arxiv.org/abs/2205.00920v2) - [pdf](http://arxiv.org/pdf/2205.00920v2) > Despite the recent success of artificial neural networks, more biologically plausible learning methods may be needed to resolve the weaknesses of backpropagation trained models such as catastrophic forgetting and adversarial attacks. Although these weaknesses are not specifically addressed, a novel local learning rule is presented that performs online clustering with an upper limit on the number of clusters to be found rather than a fixed cluster count. Instead of using orthogonal weight or output activation constraints, activation sparsity is achieved by mutual repulsion of lateral Gaussian neurons ensuring that multiple neuron centers cannot occupy the same location in the input domain. An update method is also presented for adjusting the widths of the Gaussian neurons in cases where the data samples can be represented by means and variances. The algorithms were applied on the MNIST and CIFAR-10 datasets to create filters capturing the input patterns of pixel patches of various sizes. The experimental results demonstrate stability in the learned parameters across a large number of training samples.
2022-08-07 17:12:12 - Video-based Human Action Recognition using Deep Learning: A Review - *Hieu H. Pham, Louahdi Khoudour, Alain Crouzil, Pablo Zegers, Sergio A. Velastin* - `2208.03775v1` - [abs](http://arxiv.org/abs/2208.03775v1) - [pdf](http://arxiv.org/pdf/2208.03775v1) > Human action recognition is an important application domain in computer vision. Its primary aim is to accurately describe human actions and their interactions from a previously unseen data sequence acquired by sensors. The ability to recognize, understand, and predict complex human actions enables the construction of many important applications such as intelligent surveillance systems, human-computer interfaces, health care, security, and military applications. In recent years, deep learning has been given particular attention by the computer vision community. This paper presents an overview of the current state-of-the-art in action recognition using video analysis with deep learning techniques. We present the most important deep learning models for recognizing human actions, and analyze them to provide the current progress of deep learning algorithms applied to solve human action recognition problems in realistic videos highlighting their advantages and disadvantages. Based on the quantitative analysis using recognition accuracies reported in the literature, our study identifies state-of-the-art deep architectures in action recognition and then provides current trends and open problems for future works in this field.
2022-08-08 13:01:06 - Automated image analysis in large-scale cellular electron microscopy: A literature survey - *Anusha Aswath, Ahmad Alsahaf, Ben N. G. Giepmans, George Azzopardi* - `2206.07171v2` - [abs](http://arxiv.org/abs/2206.07171v2) - [pdf](http://arxiv.org/pdf/2206.07171v2) > Large-scale electron microscopy (EM) datasets generated using (semi-) automated microscopes are becoming the standard in EM. Given the vast amounts of data, manual analysis of all data is not feasible, thus automated analysis is crucial. The main challenges in automated analysis include the annotation that is needed to analyse and interpret biomedical images, coupled with achieving high-throughput. Here, we review the current state-of-the-art of automated computer techniques and major challenges for the analysis of structures in cellular EM. The advanced computer vision, deep learning and software tools that have been developed in the last five years for automatic biomedical image analysis are discussed with respect to annotation, segmentation and scalability for EM data. Integration of automatic image acquisition and analysis will allow for high-throughput analysis of millimeter-range datasets with nanometer resolution.
2022-08-08 14:04:38 - Abstractive Meeting Summarization: A Survey - *Virgile Rennard, Guokan Shang, Julie Hunter, Michalis Vazirgiannis* - `2208.04163v1` - [abs](http://arxiv.org/abs/2208.04163v1) - [pdf](http://arxiv.org/pdf/2208.04163v1) > Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved the performance of abstractive summarization systems. While the majority of research has focused on written documents, we have observed an increasing interest in the summarization of dialogues and multi-party conversation over the past few years. A system that could reliably transform the audio or transcript of a human conversation into an abridged version that homes in on the most important points of the discussion would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. This paper focuses on abstractive summarization for multi-party meetings, providing a survey of the challenges, datasets and systems relevant to this task and a discussion of promising directions for future study.
2022-08-08 17:50:53 - Ensemble deep learning: A review - *M. A. Ganaie, Minghui Hu, A. K. Malik, M. Tanveer, P. N. Suganthan* - `2104.02395v3` - [abs](http://arxiv.org/abs/2104.02395v3) - [pdf](http://arxiv.org/pdf/2104.02395v3) > Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning architectures are showing better performance compared to the shallow or traditional models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into bagging, boosting, stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous/heterogeneous ensemble, decision fusion strategies based deep ensemble models. Applications of deep ensemble models in different domains are also briefly discussed. Finally, we conclude this paper with some potential future research directions.
2022-08-08 17:59:11 - 3D Vision with Transformers: A Survey - *Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang* - `2208.04309v1` - [abs](http://arxiv.org/abs/2208.04309v1) - [pdf](http://arxiv.org/pdf/2208.04309v1) > The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.
2022-08-08 18:57:24 - A Survey on RGB-D Datasets - *Alexandre Lopes, Roberto Souza, Helio Pedrini* - `2201.05761v2` - [abs](http://arxiv.org/abs/2201.05761v2) - [pdf](http://arxiv.org/pdf/2201.05761v2) > RGB-D data is essential for solving many problems in computer vision. Hundreds of public RGB-D datasets containing various scenes, such as indoor, outdoor, aerial, driving, and medical, have been proposed. These datasets are useful for different applications and are fundamental for addressing classic computer vision tasks, such as monocular depth estimation. This paper reviewed and categorized image datasets that include depth information. We gathered 203 datasets that contain accessible data and grouped them into three categories: scene/objects, body, and medical. We also provided an overview of the different types of sensors, depth applications, and we examined trends and future directions of the usage and creation of datasets containing depth data, and how they can be applied to investigate the development of generalizable machine learning models in the monocular depth estimation field.
2022-08-08 20:54:34 - Deep Learning Driven Natural Languages Text to SQL Query Conversion: A Survey - *Ayush Kumar, Parth Nagarkar, Prabhav Nalhe, Sanjeev Vijayakumar* - `2208.04415v1` - [abs](http://arxiv.org/abs/2208.04415v1) - [pdf](http://arxiv.org/pdf/2208.04415v1) > With the future striving toward data-centric decision-making, seamless access to databases is of utmost importance. There is extensive research on creating an efficient text-to-sql (TEXT2SQL) model to access data from the database. Using a Natural language is one of the best interfaces that can bridge the gap between the data and results by accessing the database efficiently, especially for non-technical users. It will open the doors and create tremendous interest among users who are well versed in technical skills or not very skilled in query languages. Even if numerous deep learning-based algorithms are proposed or studied, there still is very challenging to have a generic model to solve the data query issues using natural language in a real-work scenario. The reason is the use of different datasets in different studies, which comes with its limitations and assumptions. At the same time, we do lack a thorough understanding of these proposed models and their limitations with the specific dataset it is trained on. In this paper, we try to present a holistic overview of 24 recent neural network models studied in the last couple of years, including their architectures involving convolutional neural networks, recurrent neural networks, pointer networks, reinforcement learning, generative models, etc. We also give an overview of the 11 datasets that are widely used to train the models for TEXT2SQL technologies. We also discuss the future application possibilities of TEXT2SQL technologies for seamless data queries.
2022-08-08 23:14:51 - Denoising Induction Motor Sounds Using an Autoencoder - *Thanh Tran, Sebastian Bader, Jan Lundgren* - `2208.04462v1` - [abs](http://arxiv.org/abs/2208.04462v1) - [pdf](http://arxiv.org/pdf/2208.04462v1) > Denoising is the process of removing noise from sound signals while improving the quality and adequacy of the sound signals. Denoising sound has many applications in speech processing, sound events classification, and machine failure detection systems. This paper describes a method for creating an autoencoder to map noisy machine sounds to clean sounds for denoising purposes. There are several types of noise in sounds, for example, environmental noise and generated frequency-dependent noise from signal processing methods. Noise generated by environmental activities is environmental noise. In the factory, environmental noise can be created by vehicles, drilling, people working or talking in the survey area, wind, and flowing water. Those noises appear as spikes in the sound record. In the scope of this paper, we demonstrate the removal of generated noise with Gaussian distribution and the environmental noise with a specific example of the water sink faucet noise from the induction motor sounds. The proposed method was trained and verified on 49 normal function sounds and 197 horizontal misalignment fault sounds from the Machinery Fault Database (MAFAULDA). The mean square error (MSE) was used as the assessment criteria to evaluate the similarity between denoised sounds using the proposed autoencoder and the original sounds in the test set. The MSE is below or equal to 0.14 when denoise both types of noises on 15 testing sounds of the normal function category. The MSE is below or equal to 0.15 when denoising 60 testing sounds on the horizontal misalignment fault category. The low MSE shows that both the generated Gaussian noise and the environmental noise were almost removed from the original sounds with the proposed trained autoencoder.
2022-08-09 05:35:16 - Inconsistencies in Measuring Student Engagement in Virtual Learning -- A Critical Review - *Shehroz S. Khan, Ali Abedi, Tracey Colella* - `2208.04548v1` - [abs](http://arxiv.org/abs/2208.04548v1) - [pdf](http://arxiv.org/pdf/2208.04548v1) > In recent years, virtual learning has emerged as an alternative to traditional classroom teaching. The engagement of students in virtual learning can have a major impact on meeting learning objectives and program dropout risks. There exist many measurement instruments specifically geared to Student Engagement (SE) in virtual learning environments. In this critical review, we analyze these works and highlight inconsistencies in terms of differing engagement definitions and measurement scales. This diversity among existing researchers could potentially be problematic in comparing different annotations and building generalizable predictive models. We further discuss issues in terms of engagement annotations and design flaws. We analyze the existing SE annotation scales based on our defined seven dimensions of engagement annotation, including sources, data modalities used for annotation, the time when the annotation takes place, the timesteps in which the annotation takes place, level of abstraction, combination, and quantification. One of the surprising findings was that very few of the reviewed datasets on SE measurement used existing psychometrically validated engagement scales in their annotation. Lastly, we discuss some other scales in settings other than virtual learning that have the potential to be used in measuring SE in virtual learning.
2022-08-09 12:54:34 - Discover the Mysteries of the Maya: Selected Contributions from the Machine Learning Challenge & The Discovery Challenge Workshop at ECML PKDD 2021 - *Dragi Kocev, Nikola Simidjievski, Ana Kostovska, Ivica Dimitrovski, Žiga Kokalj* - `2208.03163v2` - [abs](http://arxiv.org/abs/2208.03163v2) - [pdf](http://arxiv.org/pdf/2208.03163v2) > The volume contains selected contributions from the Machine Learning Challenge "Discover the Mysteries of the Maya", presented at the Discovery Challenge Track of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021). Remote sensing has greatly accelerated traditional archaeological landscape surveys in the forested regions of the ancient Maya. Typical exploration and discovery attempts, beside focusing on whole ancient cities, focus also on individual buildings and structures. Recently, there have been several successful attempts of utilizing machine learning for identifying ancient Maya settlements. These attempts, while relevant, focus on narrow areas and rely on high-quality aerial laser scanning (ALS) data which covers only a fraction of the region where ancient Maya were once settled. Satellite image data, on the other hand, produced by the European Space Agency's (ESA) Sentinel missions, is abundant and, more importantly, publicly available. The "Discover the Mysteries of the Maya" challenge aimed at locating and identifying ancient Maya architectures (buildings, aguadas, and platforms) by performing integrated image segmentation of different types of satellite imagery (from Sentinel-1 and Sentinel-2) data and ALS (lidar) data.
2022-08-09 13:35:26 - Hierarchical Interpretation of Neural Text Classification - *Hanqi Yan, Lin Gui, Yulan He* - `2202.09792v3` - [abs](http://arxiv.org/abs/2202.09792v3) - [pdf](http://arxiv.org/pdf/2202.09792v3) > Recent years have witnessed increasing interests in developing interpretable models in Natural Language Processing (NLP). Most existing models aim at identifying input features such as words or phrases important for model predictions. Neural models developed in NLP however often compose word semantics in a hierarchical manner and text classification requires hierarchical modelling to aggregate local information in order to deal with topic and label shifts more effectively. As such, interpretation by words or phrases only cannot faithfully explain model decisions in text classification. This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions in the form of label-associated topics in a hierarchical manner. Model interpretation is no longer at the word level, but built on topics as the basic semantic unit. Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers, and generates interpretations more faithful to model predictions and better understood by humans than other interpretable neural text classifiers.
2022-08-09 17:12:27 - Deep Learning-Based Objective and Reproducible Osteosarcoma Chemotherapy Response Assessment and Outcome Prediction - *David Joon Ho, Narasimhan P. Agaram, Marc-Henri Jean, Stephanie D. Suser, Cynthia Chu, Chad M. Vanderbilt, Paul A. Meyers, Leonard H. Wexler, John H. Healey, Thomas J. Fuchs, Meera R. Hameed* - `2208.04910v1` - [abs](http://arxiv.org/abs/2208.04910v1) - [pdf](http://arxiv.org/pdf/2208.04910v1) > Osteosarcoma is the most common primary bone cancer whose standard treatment includes pre-operative chemotherapy followed by resection. Chemotherapy response is used for predicting prognosis and further management of patients. Necrosis is routinely assessed post-chemotherapy from histology slides on resection specimens where necrosis ratio is defined as the ratio of necrotic tumor to overall tumor. Patients with necrosis ratio >=90% are known to have better outcome. Manual microscopic review of necrosis ratio from multiple glass slides is semi-quantitative and can have intra- and inter-observer variability. We propose an objective and reproducible deep learning-based approach to estimate necrosis ratio with outcome prediction from scanned hematoxylin and eosin whole slide images. We collected 103 osteosarcoma cases with 3134 WSIs to train our deep learning model, to validate necrosis ratio assessment, and to evaluate outcome prediction. We trained Deep Multi-Magnification Network to segment multiple tissue subtypes including viable tumor and necrotic tumor in pixel-level and to calculate case-level necrosis ratio from multiple WSIs. We showed necrosis ratio estimated by our segmentation model highly correlates with necrosis ratio from pathology reports manually assessed by experts where mean absolute differences for Grades IV (100%), III (>=90%), and II (>=50% and <90%) necrosis response are 4.4%, 4.5%, and 17.8%, respectively. We successfully stratified patients to predict overall survival with p=10^-6 and progression-free survival with p=0.012. Our reproducible approach without variability enabled us to tune cutoff thresholds, specifically for our model and our data set, to 80% for OS and 60% for PFS. Our study indicates deep learning can support pathologists as an objective tool to analyze osteosarcoma from histology for assessing treatment response and predicting patient outcome.
2022-08-10 05:08:37 - TagRec++: Hierarchical Label Aware Attention Network for Question Categorization - *Venktesh Viswanathan, Mukesh Mohania, Vikram Goyal* - `2208.05152v1` - [abs](http://arxiv.org/abs/2208.05152v1) - [pdf](http://arxiv.org/pdf/2208.05152v1) > Online learning systems have multiple data repositories in the form of transcripts, books and questions. To enable ease of access, such systems organize the content according to a well defined taxonomy of hierarchical nature (subject-chapter-topic). The task of categorizing inputs to the hierarchical labels is usually cast as a flat multi-class classification problem. Such approaches ignore the semantic relatedness between the terms in the input and the tokens in the hierarchical labels. Alternate approaches also suffer from class imbalance when they only consider leaf level nodes as labels. To tackle the issues, we formulate the task as a dense retrieval problem to retrieve the appropriate hierarchical labels for each content. In this paper, we deal with categorizing questions. We model the hierarchical labels as a composition of their tokens and use an efficient cross-attention mechanism to fuse the information with the term representations of the content. We also propose an adaptive in-batch hard negative sampling approach which samples better negatives as the training progresses. We demonstrate that the proposed approach \textit{TagRec++} outperforms existing state-of-the-art approaches on question datasets as measured by Recall@k. In addition, we demonstrate zero-shot capabilities of \textit{TagRec++} and ability to adapt to label changes.
2022-08-10 05:52:57 - Deep Learning Based Single Sample Per Person Face Recognition: A Survey - *Fan Liu, Delong Chen, Fei Wang, Zewen Li, Feng Xu* - `2006.11395v2` - [abs](http://arxiv.org/abs/2006.11395v2) - [pdf](http://arxiv.org/pdf/2006.11395v2) > Face recognition has long been an active research area in the field of artificial intelligence, particularly since the rise of deep learning in recent years. In some practical situations, each identity has only a single sample available for training. Face recognition under this situation is referred to as single sample face recognition and poses significant challenges to the effective training of deep models. Therefore, in recent years, researchers have attempted to unleash more potential of deep learning and improve the model recognition performance in the single sample situation. While several comprehensive surveys have been conducted on traditional single sample face recognition approaches, emerging deep learning based methods are rarely involved in these reviews. Accordingly, we focus on the deep learning-based methods in this paper, classifying them into virtual sample methods and generic learning methods. In the former category, virtual images or virtual features are generated to benefit the training of the deep model. In the latter one, additional multi-sample generic sets are used. There are three types of generic learning methods: combining traditional methods and deep features, improving the loss function, and improving network structure, all of which are covered in our analysis. Moreover, we review face datasets that have been commonly used for evaluating single sample face recognition models and go on to compare the results of different types of models. Additionally, we discuss problems with existing single sample face recognition methods, including identity information preservation in virtual sample methods, domain adaption in generic learning methods. Furthermore, we regard developing unsupervised methods is a promising future direction, and point out that the semantic gap as an important issue that needs to be further considered.
2022-08-10 14:37:03 - E Pluribus Unum Interpretable Convolutional Neural Networks - *George Dimas, Eirini Cholopoulou, Dimitris K. Iakovidis* - `2208.05369v1` - [abs](http://arxiv.org/abs/2208.05369v1) - [pdf](http://arxiv.org/pdf/2208.05369v1) > The adoption of Convolutional Neural Network (CNN) models in high-stake domains is hindered by their inability to meet society's demand for transparency in decision-making. So far, a growing number of methodologies have emerged for developing CNN models that are interpretable by design. However, such models are not capable of providing interpretations in accordance with human perception, while maintaining competent performance. In this paper, we tackle these challenges with a novel, general framework for instantiating inherently interpretable CNN models, named E Pluribus Unum Interpretable CNN (EPU-CNN). An EPU-CNN model consists of CNN sub-networks, each of which receives a different representation of an input image expressing a perceptual feature, such as color or texture. The output of an EPU-CNN model consists of the classification prediction and its interpretation, in terms of relative contributions of perceptual features in different regions of the input image. EPU-CNN models have been extensively evaluated on various publicly available datasets, as well as a contributed benchmark dataset. Medical datasets are used to demonstrate the applicability of EPU-CNN for risk-sensitive decisions in medicine. The experimental results indicate that EPU-CNN models can achieve a comparable or better classification performance than other CNN architectures while providing humanly perceivable interpretations.
2022-08-10 15:39:20 - Towards Autonomous Atlas-based Ultrasound Acquisitions in Presence of Articulated Motion - *Zhongliang Jiang, Yuan Gao, Le Xie, Nassir Navab* - `2208.05399v1` - [abs](http://arxiv.org/abs/2208.05399v1) - [pdf](http://arxiv.org/pdf/2208.05399v1) > Robotic ultrasound (US) imaging aims at overcoming some of the limitations of free-hand US examinations, e.g. difficulty in guaranteeing intra- and inter-operator repeatability. However, due to anatomical and physiological variations between patients and relative movement of anatomical substructures, it is challenging to robustly generate optimal trajectories to examine the anatomies of interest, in particular, when they comprise articulated joints. To address this challenge, this paper proposes a vision-based approach allowing autonomous robotic US limb scanning. To this end, an atlas MRI template of a human arm with annotated vascular structures is used to generate trajectories and register and project them onto patients' skin surfaces for robotic US acquisition. To effectively segment and accurately reconstruct the targeted 3D vessel, we make use of spatial continuity in consecutive US frames by incorporating channel attention modules into a U-Net-type neural network. The automatic trajectory generation method is evaluated on six volunteers with various articulated joint angles. In all cases, the system can successfully acquire the planned vascular structure on volunteers' limbs. For one volunteer the MRI scan was also available, which allows the evaluation of the average radius of the scanned artery from US images, resulting in a radius estimation ($1.2\pm0.05~mm$) comparable to the MRI ground truth ($1.2\pm0.04~mm$).
2022-08-11 09:56:02 - Deep Learning for Deepfakes Creation and Detection: A Survey - *Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Dung Tien Nguyen, Duc Thanh Nguyen, Thien Huynh-The, Saeid Nahavandi, Thanh Tam Nguyen, Quoc-Viet Pham, Cuong M. Nguyen* - `1909.11573v5` - [abs](http://arxiv.org/abs/1909.11573v5) - [pdf](http://arxiv.org/pdf/1909.11573v5) > Deep learning has been successfully applied to solve various complex problems ranging from big data analytics to computer vision and human-level control. Deep learning advances however have also been employed to create software that can cause threats to privacy, democracy and national security. One of those deep learning-powered applications recently emerged is deepfake. Deepfake algorithms can create fake images and videos that humans cannot distinguish them from authentic ones. The proposal of technologies that can automatically detect and assess the integrity of digital visual media is therefore indispensable. This paper presents a survey of algorithms used to create deepfakes and, more importantly, methods proposed to detect deepfakes in the literature to date. We present extensive discussions on challenges, research trends and directions related to deepfake technologies. By reviewing the background of deepfakes and state-of-the-art deepfake detection methods, this study provides a comprehensive overview of deepfake techniques and facilitates the development of new and more robust methods to deal with the increasingly challenging deepfakes.
2022-08-11 11:27:38 - A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception - *Keenan Jones, Enes Altuncu, Virginia N. L. Franqueira, Yichao Wang, Shujun Li* - `2208.05757v1` - [abs](http://arxiv.org/abs/2208.05757v1) - [pdf](http://arxiv.org/pdf/2208.05757v1) > In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends, including the creation of fake news and misinformation, the generation of fake online product reviews, or via chatbots as means of convincing users to divulge private information. In this paper, we provide an overview of the NLG field via the identification and examination of 119 survey-like papers focused on NLG research. From these identified papers, we outline a proposed high-level taxonomy of the central concepts that constitute NLG, including the methods used to develop generalised NLG systems, the means by which these systems are evaluated, and the popular NLG tasks and subtasks that exist. In turn, we provide an overview and discussion of each of these items with respect to current research and offer an examination of the potential roles of NLG in deception and detection systems to counteract these threats. Moreover, we discuss the broader challenges of NLG, including the risks of bias that are often exhibited by existing text generation systems. This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research.
2022-08-11 16:30:58 - Applying data technologies to combat AMR: current status, challenges, and opportunities on the way forward - *Leonid Chindelevitch, Elita Jauneikaite, Nicole E. Wheeler, Kasim Allel, Bede Yaw Ansiri-Asafoakaa, Wireko A. Awuah, Denis C. Bauer, Stephan Beisken, Kara Fan, Gary Grant, Michael Graz, Yara Khalaf, Veranja Liyanapathirana, Carlos Montefusco-Pereira, Lawrence Mugisha, Atharv Naik, Sylvia Nanono, Anthony Nguyen, Timothy Rawson, Kessendri Reddy, Juliana M. Ruzante, Anneke Schmider, Roman Stocker, Leonhardt Unruh, Daniel Waruingi, Heather Graz, Maarten van Dongen* - `2208.04683v2` - [abs](http://arxiv.org/abs/2208.04683v2) - [pdf](http://arxiv.org/pdf/2208.04683v2) > Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Numerous interventions have been proposed to control the development of AMR and mitigate the risks posed by its spread. This paper reviews key aspects of bacterial AMR management and control which make essential use of data technologies such as artificial intelligence, machine learning, and mathematical and statistical modelling, fields that have seen rapid developments in this century. Although data technologies have become an integral part of biomedical research, their impact on AMR management has remained modest. We outline the use of data technologies to combat AMR, detailing recent advancements in four complementary categories: surveillance, prevention, diagnosis, and treatment. We provide an overview on current AMR control approaches using data technologies within biomedical research, clinical practice, and in the "One Health" context. We discuss the potential impact and challenges wider implementation of data technologies is facing in high-income as well as in low- and middle-income countries, and recommend concrete actions needed to allow these technologies to be more readily integrated within the healthcare and public health sectors.
2022-08-11 22:50:51 - ICIP 2022 Challenge on Parasitic Egg Detection and Classification in Microscopic Images: Dataset, Methods and Results - *Nantheera Anantrasirichai, Thanarat H. Chalidabhongse, Duangdao Palasuwan, Korranat Naruenatthanaset, Thananop Kobchaisawat, Nuntiporn Nunthanasup, Kanyarat Boonpeng, Xudong Ma, Alin Achim* - `2208.06063v1` - [abs](http://arxiv.org/abs/2208.06063v1) - [pdf](http://arxiv.org/pdf/2208.06063v1) > Manual examination of faecal smear samples to identify the existence of parasitic eggs is very time-consuming and can only be done by specialists. Therefore, an automated system is required to tackle this problem since it can relate to serious intestinal parasitic infections. This paper reviews the ICIP 2022 Challenge on parasitic egg detection and classification in microscopic images. We describe a new dataset for this application, which is the largest dataset of its kind. The methods used by participants in the challenge are summarised and discussed along with their results.
2022-08-11 23:04:48 - Mixed-Precision Neural Networks: A Survey - *Mariam Rakka, Mohammed E. Fouda, Pramod Khargonekar, Fadi Kurdahi* - `2208.06064v1` - [abs](http://arxiv.org/abs/2208.06064v1) - [pdf](http://arxiv.org/pdf/2208.06064v1) > Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that preserves accuracy is not easily found, especially with the abundance of models, datasets, and quantization techniques that creates an enormous search space. In order to tackle this difficulty, a body of literature has emerged recently, and several frameworks that achieved promising accuracy results have been proposed. In this paper, we start by summarizing the quantization techniques used generally in literature. Then, we present a thorough survey of the mixed-precision frameworks, categorized according to their optimization techniques such as reinforcement learning and quantization techniques like deterministic rounding. Furthermore, the advantages and shortcomings of each framework are discussed, where we present a juxtaposition. We finally give guidelines for future mixed-precision frameworks.
2022-08-12 08:24:43 - Domain Generalization: A Survey - *Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy* - `2103.02503v7` - [abs](http://arxiv.org/abs/2103.02503v7) - [pdf](http://arxiv.org/pdf/2103.02503v7) > Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Over the last ten years, research in DG has made great progress, leading to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, to name a few; DG has also been studied in various application areas including computer vision, speech recognition, natural language processing, medical imaging, and reinforcement learning. In this paper, for the first time a comprehensive literature review in DG is provided to summarize the developments over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other relevant fields like domain adaptation and transfer learning. Then, we conduct a thorough review into existing methods and theories. Finally, we conclude this survey with insights and discussions on future research directions.
2022-08-12 22:26:05 - Traditional methods in Edge, Corner and Boundary detection - *Sai Pavan Tadem* - `2208.07714v1` - [abs](http://arxiv.org/abs/2208.07714v1) - [pdf](http://arxiv.org/pdf/2208.07714v1) > This is a review paper of traditional approaches for edge, corner, and boundary detection methods. There are many real-world applications of edge, corner, and boundary detection methods. For instance, in medical image analysis, edge detectors are used to extract the features from the given image. In modern innovations like autonomous vehicles, edge detection and segmentation are the most crucial things. If we want to detect motion or track video, corner detectors help. I tried to compare the results of detectors stage-wise wherever it is possible and also discussed the importance of image prepossessing to minimise the noise. Real-world images are used to validate detector performance and limitations.
2022-08-12 23:58:57 - A Gentle Introduction and Survey on Computing with Words (CWW) Methodologies - *Prashant K. Gupta, Javier Andreu-Perez* - `2208.06532v1` - [abs](http://arxiv.org/abs/2208.06532v1) - [pdf](http://arxiv.org/pdf/2208.06532v1) > Human beings have an inherent capability to use linguistic information (LI) seamlessly even though it is vague and imprecise. Computing with Words (CWW) was proposed to impart computing systems with this capability of human beings. The interest in the field of CWW is evident from a number of publications on various CWW methodologies. These methodologies use different ways to model the semantics of the LI. However, to the best of our knowledge, the literature on these methodologies is mostly scattered and does not give an interested researcher a comprehensive but gentle guide about the notion and utility of these methodologies. Hence, to introduce the foundations and state-of-the-art CWW methodologies, we provide a concise but a wide-ranging coverage of them in a simple and easy to understand manner. We feel that the simplicity with which we give a high-quality review and introduction to the CWW methodologies is very useful for investigators, especially those embarking on the use of CWW for the first time. We also provide future research directions to build upon for the interested and motivated researchers.
2022-08-13 01:20:39 - MaskBlock: Transferable Adversarial Examples with Bayes Approach - *Mingyuan Fan, Cen Chen, Ximeng Liu, Wenzhong Guo* - `2208.06538v1` - [abs](http://arxiv.org/abs/2208.06538v1) - [pdf](http://arxiv.org/pdf/2208.06538v1) > The transferability of adversarial examples (AEs) across diverse models is of critical importance for black-box adversarial attacks, where attackers cannot access the information about black-box models. However, crafted AEs always present poor transferability. In this paper, by regarding the transferability of AEs as generalization ability of the model, we reveal that vanilla black-box attacks craft AEs via solving a maximum likelihood estimation (MLE) problem. For MLE, the results probably are model-specific local optimum when available data is small, i.e., limiting the transferability of AEs. By contrast, we re-formulate crafting transferable AEs as the maximizing a posteriori probability estimation problem, which is an effective approach to boost the generalization of results with limited available data. Because Bayes posterior inference is commonly intractable, a simple yet effective method called MaskBlock is developed to approximately estimate. Moreover, we show that the formulated framework is a generalization version for various attack methods. Extensive experiments illustrate MaskBlock can significantly improve the transferability of crafted adversarial examples by up to about 20%.
2022-08-13 02:49:28 - Learning with Limited Annotations: A Survey on Deep Semi-Supervised Learning for Medical Image Segmentation - *Rushi Jiao, Yichi Zhang, Le Ding, Rong Cai, Jicong Zhang* - `2207.14191v2` - [abs](http://arxiv.org/abs/2207.14191v2) - [pdf](http://arxiv.org/pdf/2207.14191v2) > Medical image segmentation is a fundamental and critical step in many image-guided clinical approaches. Recent success of deep learning-based segmentation methods usually relies on a large amount of labeled data, which is particularly difficult and costly to obtain especially in the medical imaging domain where only experts can provide reliable and accurate annotations. Semi-supervised learning has emerged as an appealing strategy and been widely applied to medical image segmentation tasks to train deep models with limited annotations. In this paper, we present a comprehensive review of recently proposed semi-supervised learning methods for medical image segmentation and summarized both the technical novelties and empirical results. Furthermore, we analyze and discuss the limitations and several unsolved problems of existing approaches. We hope this review could inspire the research community to explore solutions for this challenge and further promote the developments in medical image segmentation field.
2022-08-13 13:46:13 - Differentiable Inductive Logic Programming in High-Dimensional Space - *Stanisław J. Purgał, David M. Cerna, Cezary Kaliszyk* - `2208.06652v1` - [abs](http://arxiv.org/abs/2208.06652v1) - [pdf](http://arxiv.org/pdf/2208.06652v1) > Synthesizing large logic programs through Inductive Logic Programming (ILP) typically requires intermediate definitions. However, cluttering the hypothesis space with intensional predicates often degrades performance. In contrast, gradient descent provides an efficient way to find solutions within such high-dimensional spaces. Neuro-symbolic ILP approaches have not fully exploited this so far. We propose an approach to ILP-based synthesis benefiting from large-scale predicate invention exploiting the efficacy of high-dimensional gradient descent. We find symbolic solutions containing upwards of ten auxiliary definitions. This is beyond the achievements of existing neuro-symbolic ILP systems, thus constituting a milestone in the field.
2022-08-13 16:45:30 - Modeling Biological Face Recognition with Deep Convolutional Neural Networks - *Leonard E. van Dyck, Walter R. Gruber* - `2208.06681v1` - [abs](http://arxiv.org/abs/2208.06681v1) - [pdf](http://arxiv.org/pdf/2208.06681v1) > Deep Convolutional Neural Networks (DCNNs) have become the state-of-the-art computational models of biological object recognition. Their remarkable success has helped vision science break new ground. Consequently, recent efforts have started to transfer this achievement to the domain of biological face recognition. In this regard, face detection can be investigated through comparisons of face-selective biological areas and neurons to artificial layers and units. Similarly, face identification can be examined through comparisons of in vivo and in silico face space representations. In this mini-review, we summarize the first studies with this aim. We argue that DCNNs are useful models, which follow the general hierarchical organization of biological face recognition. In two spotlights, we emphasize unique scientific contributions of these models. Firstly, studies on face detection in DCNNs propose that elementary face-selectivity emerges automatically through feedforward processes. Secondly, studies on face identification in DCNNs suggest that experience and additional generative mechanisms are required for this challenge. Taken together, as this novel computational approach enables close control of predisposition (i.e., architecture) and experience (i.e., training data), this could also inform longstanding debates on the substrates of biological face recognition.
2022-08-14 10:54:49 - BDSL 49: A Comprehensive Dataset of Bangla Sign Language - *Ayman Hasib, Saqib Sizan Khan, Jannatul Ferdous Eva, Mst. Nipa Khatun, Ashraful Haque, Nishat Shahrin, Rashik Rahman, Hasan Murad, Md. Rajibul Islam, Molla Rashied Hussein* - `2208.06827v1` - [abs](http://arxiv.org/abs/2208.06827v1) - [pdf](http://arxiv.org/pdf/2208.06827v1) > Language is a method by which individuals express their thoughts. Each language has its own set of alphabetic and numeric characters. People can communicate with one another through either oral or written communication. However, each language has a sign language counterpart. Individuals who are deaf and/or mute communicate through sign language. The Bangla language also has a sign language, which is called BDSL. The dataset is about Bangla hand sign images. The collection contains 49 individual Bangla alphabet images in sign language. BDSL49 is a dataset that consists of 29,490 images with 49 labels. Images of 14 different adult individuals, each with a distinct background and appearance, have been recorded during data collection. Several strategies have been used to eliminate noise from datasets during preparation. This dataset is available to researchers for free. They can develop automated systems using machine learning, computer vision, and deep learning techniques. In addition, two models were used in this dataset. The first is for detection, while the second is for recognition.
2022-08-14 23:51:52 - GNPassGAN: Improved Generative Adversarial Networks For Trawling Offline Password Guessing - *Fangyi Yu, Miguel Vargas Martin* - `2208.06943v1` - [abs](http://arxiv.org/abs/2208.06943v1) - [pdf](http://arxiv.org/pdf/2208.06943v1) > The security of passwords depends on a thorough understanding of the strategies used by attackers. Unfortunately, real-world adversaries use pragmatic guessing tactics like dictionary attacks, which are difficult to simulate in password security research. Dictionary attacks must be carefully configured and modified to represent an actual threat. This approach, however, needs domain-specific knowledge and expertise that are difficult to duplicate. This paper reviews various deep learning-based password guessing approaches that do not require domain knowledge or assumptions about users' password structures and combinations. It also introduces GNPassGAN, a password guessing tool built on generative adversarial networks for trawling offline attacks. In comparison to the state-of-the-art PassGAN model, GNPassGAN is capable of guessing 88.03\% more passwords and generating 31.69\% fewer duplicates.
2022-08-15 01:32:09 - InvisibiliTee: Angle-agnostic Cloaking from Person-Tracking Systems with a Tee - *Yaxian Li, Bingqing Zhang, Guoping Zhao, Mingyu Zhang, Jiajun Liu, Ziwei Wang, Jirong Wen* - `2208.06962v1` - [abs](http://arxiv.org/abs/2208.06962v1) - [pdf](http://arxiv.org/pdf/2208.06962v1) > After a survey for person-tracking system-induced privacy concerns, we propose a black-box adversarial attack method on state-of-the-art human detection models called InvisibiliTee. The method learns printable adversarial patterns for T-shirts that cloak wearers in the physical world in front of person-tracking systems. We design an angle-agnostic learning scheme which utilizes segmentation of the fashion dataset and a geometric warping process so the adversarial patterns generated are effective in fooling person detectors from all camera angles and for unseen black-box detection models. Empirical results in both digital and physical environments show that with the InvisibiliTee on, person-tracking systems' ability to detect the wearer drops significantly.
2022-08-15 13:39:45 - A Medical Information Extraction Workbench to Process German Clinical Text - *Roland Roller, Laura Seiffe, Ammer Ayach, Sebastian Möller, Oliver Marten, Michael Mikhailov, Christoph Alt, Danilo Schmidt, Fabian Halleck, Marcel Naik, Wiebke Duettmann, Klemens Budde* - `2207.03885v2` - [abs](http://arxiv.org/abs/2207.03885v2) - [pdf](http://arxiv.org/pdf/2207.03885v2) > Background: In the information extraction and natural language processing domain, accessible datasets are crucial to reproduce and compare results. Publicly available implementations and tools can serve as benchmark and facilitate the development of more complex applications. However, in the context of clinical text processing the number of accessible datasets is scarce -- and so is the number of existing tools. One of the main reasons is the sensitivity of the data. This problem is even more evident for non-English languages. Approach: In order to address this situation, we introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Result: The presented models provide promising results on in-domain data. Moreover, we show that our models can be also successfully applied to other biomedical text in German. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.
2022-08-15 15:51:29 - Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions - *Ashish Rauniyar, Desta Haileselassie Hagos, Debesh Jha, Jan Erik Håkegård, Ulas Bagci, Danda B. Rawat, Vladimir Vlassov* - `2208.03392v3` - [abs](http://arxiv.org/abs/2208.03392v3) - [pdf](http://arxiv.org/pdf/2208.03392v3) > With the advent of the IoT, AI, and ML/DL algorithms, the data-driven medical application has emerged as a promising tool for designing reliable and scalable diagnostic and prognostic models from medical data. This has attracted a great deal of attention from academia to industry in recent years. This has undoubtedly improved the quality of healthcare delivery. However, these AI-based medical applications still have poor adoption due to their difficulties in satisfying strict security, privacy, and quality of service standards (such as low latency). Moreover, medical data are usually fragmented and private, making it challenging to generate robust results across populations. Recent developments in federated learning (FL) have made it possible to train complex machine-learned models in a distributed manner. Thus, FL has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and security concerns. To this end, this survey paper highlights the current and future of FL technology in medical applications where data sharing is a significant burden. It also review and discuss the current research trends and their outcomes for designing reliable and scalable FL models. We outline the general FL's statistical problems, device challenges, security, privacy concerns, and its potential in the medical domain. Moreover, our study is also focused on medical applications where we highlight the burden of global cancer and the efficient use of FL for the development of computer-aided diagnosis tools for addressing them. We hope that this review serves as a checkpoint that sets forth the existing state-of-the-art works in a thorough manner and offers open problems and future research directions for this field.
2022-08-15 16:26:13 - Learn2Trust: A video and streamlit-based educational programme for AI-based medical image analysis targeted towards medical students - *Hanna Siebert, Marian Himstedt, Mattias Heinrich* - `2208.07314v1` - [abs](http://arxiv.org/abs/2208.07314v1) - [pdf](http://arxiv.org/pdf/2208.07314v1) > In order to be able to use artificial intelligence (AI) in medicine without scepticism and to recognise and assess its growing potential, a basic understanding of this topic is necessary among current and future medical staff. Under the premise of "trust through understanding", we developed an innovative online course as a learning opportunity within the framework of the German KI Campus (AI campus) project, which is a self-guided course that teaches the basics of AI for the analysis of medical image data. The main goal is to provide a learning environment for a sufficient understanding of AI in medical image analysis so that further interest in this topic is stimulated and inhibitions towards its use can be overcome by means of positive application experience. The focus was on medical applications and the fundamentals of machine learning. The online course was divided into consecutive lessons, which include theory in the form of explanatory videos, practical exercises in the form of Streamlit and practical exercises and/or quizzes to check learning progress. A survey among the participating medical students in the first run of the course was used to analyse our research hypotheses quantitatively.
2022-08-15 18:30:22 - A Survey of Recommender System Techniques and the Ecommerce Domain - *Imran Hossain, Md Aminul Haque Palash, Anika Tabassum Sejuty, Noor A Tanjim, MD Abdullah AL Nasim, Sarwar Saif, Abu Bokor Suraj* - `2208.07399v1` - [abs](http://arxiv.org/abs/2208.07399v1) - [pdf](http://arxiv.org/pdf/2208.07399v1) > In this big data era, it is hard for the current generation to find the right data from the huge amount of data contained within online platforms. In such a situation, there is a need for an information filtering system that might help them find the information they are looking for. In recent years, a research field has emerged known as recommender systems. Recommenders have become important as they have many real-life applications. This paper reviews the different techniques and developments of recommender systems in e-commerce, e-tourism, e-resources, e-government, e-learning, and e-library. By analyzing recent work on this topic, we will be able to provide a detailed overview of current developments and identify existing difficulties in recommendation systems. The final results give practitioners and researchers the necessary guidance and insights into the recommendation system and its application.
2022-08-15 20:05:07 - Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives - *Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, Jonghye Woo* - `2208.07422v1` - [abs](http://arxiv.org/abs/2208.07422v1) - [pdf](http://arxiv.org/pdf/2208.07422v1) > Deep learning has become the method of choice to tackle real-world problems in different domains, partly because of its ability to learn from data and achieve impressive performance on a wide range of applications. However, its success usually relies on two assumptions: (i) vast troves of labeled datasets are required for accurate model fitting, and (ii) training and testing data are independent and identically distributed. Its performance on unseen target domains, thus, is not guaranteed, especially when encountering out-of-distribution data at the adaptation stage. The performance drop on data in a target domain is a critical problem in deploying deep neural networks that are successfully trained on data in a source domain. Unsupervised domain adaptation (UDA) is proposed to counter this, by leveraging both labeled source domain data and unlabeled target domain data to carry out various tasks in the target domain. UDA has yielded promising results on natural image processing, video analysis, natural language processing, time-series data analysis, medical image analysis, etc. In this review, as a rapidly evolving topic, we provide a systematic comparison of its methods and applications. In addition, the connection of UDA with its closely related tasks, e.g., domain generalization and out-of-distribution detection, has also been discussed. Furthermore, deficiencies in current methods and possible promising directions are highlighted.
2022-08-16 01:33:43 - A Survey on Learnable Evolutionary Algorithms for Scalable Multiobjective Optimization - *Songbai Liu, Qiuzhen Lin, Jianqiang Li, Kay Chen Tan* - `2206.11526v2` - [abs](http://arxiv.org/abs/2206.11526v2) - [pdf](http://arxiv.org/pdf/2206.11526v2) > Recent decades have witnessed remarkable advancements in multiobjective evolutionary algorithms (MOEAs) that have been adopted to solve various multiobjective optimization problems (MOPs). However, these progressively improved MOEAs have not necessarily been equipped with sophisticatedly scalable and learnable problem-solving strategies that are able to cope with new and grand challenges brought by the scaling-up MOPs with continuously increasing complexity or scale from diverse aspects, mainly including expensive function evaluations, many objectives, large-scale search space, time-varying environments, and multitask. Under different scenarios, it requires divergent thinking to design new powerful MOEAs for solving them effectively. In this context, research into learnable MOEAs that arm themselves with machine learning techniques for scaling-up MOPs has received extensive attention in the field of evolutionary computation. In this paper, we begin with a taxonomy of scalable MOPs and learnable MOEAs, followed by an analysis of the challenges that scaling up MOPs pose to traditional MOEAs. Then, we synthetically overview recent advances of learnable MOEAs in solving various scaling up MOPs, focusing primarily on three attractive and promising directions (i.e., learnable evolutionary discriminators for environmental selection, learnable evolutionary generators for reproduction, and learnable evolutionary transfer for sharing or reusing optimization experience between different problem domains). The insight into learnable MOEAs held throughout this paper is offered to the readers as a reference to the general track of the efforts in this field.
2022-08-16 03:32:13 - Towards Domain-Independent and Real-Time Gesture Recognition Using mmWave Signal - *Yadong Li, Dongheng Zhang, Jinbo Chen, Jinwei Wan, Dong Zhang, Yang Hu, Qibin Sun, Yan Chen* - `2111.06195v2` - [abs](http://arxiv.org/abs/2111.06195v2) - [pdf](http://arxiv.org/pdf/2111.06195v2) > Human gesture recognition using millimeter-wave (mmWave) signals provides attractive applications including smart home and in-car interfaces. While existing works achieve promising performance under controlled settings, practical applications are still limited due to the need of intensive data collection, extra training efforts when adapting to new domains, and poor performance for real-time recognition. In this paper, we propose DI-Gesture, a domain-independent and real-time mmWave gesture recognition system. Specifically, we first derive signal variations corresponding to human gestures with spatial-temporal processing. To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations. Furthermore, a spatial-temporal gesture segmentation algorithm is employed for real-time recognition. Extensive experimental results show DI-Gesture achieves an average accuracy of 97.92\%, 99.18\%, and 98.76\% for new users, environments, and locations, respectively. We also evaluate DI-Gesture in challenging scenarios like real-time recognition and sensing at extreme angles, all of which demonstrates the superior robustness and effectiveness of our system.
2022-08-16 06:50:04 - Supernet Training for Federated Image Classification under System Heterogeneity - *Taehyeon Kim, Se-Young Yun* - `2206.01366v4` - [abs](http://arxiv.org/abs/2206.01366v4) - [pdf](http://arxiv.org/pdf/2206.01366v4) > Efficient deployment of deep neural networks across many devices and resource constraints, especially on edge devices, is one of the most challenging problems in the presence of data-privacy preservation issues. Conventional approaches have evolved to either improve a single global model while keeping each local training data decentralized (i.e., data-heterogeneity) or to train a once-for-all network that supports diverse architectural settings to address heterogeneous systems equipped with different computational capabilities (i.e., model-heterogeneity). However, little research has considered both directions simultaneously. In this work, we propose a novel framework to consider both scenarios, namely Federation of Supernet Training (FedSup), where clients send and receive a supernet whereby it contains all possible architectures sampled from itself. It is inspired by how averaging parameters in the model aggregation stage of Federated Learning (FL) is similar to weight-sharing in supernet training. Specifically, in the FedSup framework, a weight-sharing approach widely used in the training single shot model is combined with the averaging of Federated Learning (FedAvg). Under our framework, we present an efficient algorithm (E-FedSup) by sending the sub-model to clients in the broadcast stage for reducing communication costs and training overhead. We demonstrate several strategies to enhance supernet training in the FL environment and conduct extensive empirical evaluations. The resulting framework is shown to pave the way for the robustness of both data- and model-heterogeneity on several standard benchmarks.
2022-08-16 08:47:00 - On Efficient Real-Time Semantic Segmentation: A Survey - *Christopher J. Holder, Muhammad Shafique* - `2206.08605v2` - [abs](http://arxiv.org/abs/2206.08605v2) - [pdf](http://arxiv.org/pdf/2206.08605v2) > Semantic segmentation is the problem of assigning a class label to every pixel in an image, and is an important component of an autonomous vehicle vision stack for facilitating scene understanding and object detection. However, many of the top performing semantic segmentation models are extremely complex and cumbersome, and as such are not suited to deployment onboard autonomous vehicle platforms where computational resources are limited and low-latency operation is a vital requirement. In this survey, we take a thorough look at the works that aim to address this misalignment with more compact and efficient models capable of deployment on low-memory embedded systems while meeting the constraint of real-time inference. We discuss several of the most prominent works in the field, placing them within a taxonomy based on their major contributions, and finally we evaluate the inference speed of the discussed models under consistent hardware and software setups that represent a typical research environment with high-end GPU and a realistic deployed scenario using low-memory embedded GPU hardware. Our experimental results demonstrate that many works are capable of real-time performance on resource-constrained hardware, while illustrating the consistent trade-off between latency and accuracy.
2022-08-16 10:05:19 - A Review of the Convergence of 5G/6G Architecture and Deep Learning - *Olusola T. Odeyomi, Olubiyi O. Akintade, Temitayo O. Olowu, Gergely Zaruba* - `2208.07643v1` - [abs](http://arxiv.org/abs/2208.07643v1) - [pdf](http://arxiv.org/pdf/2208.07643v1) > The convergence of 5G architecture and deep learning has gained a lot of research interests in both the fields of wireless communication and artificial intelligence. This is because deep learning technologies have been identified to be the potential driver of the 5G technologies, that make up the 5G architecture. Hence, there have been extensive surveys on the convergence of 5G architecture and deep learning. However, most of the existing survey papers mainly focused on how deep learning can converge with a specific 5G technology, thus, not covering the full spectrum of the 5G architecture. Although there is a recent survey paper that appears to be robust, a review of that paper shows that it is not well structured to specifically cover the convergence of deep learning and the 5G technologies. Hence, this paper provides a robust overview of the convergence of the key 5G technologies and deep learning. The challenges faced by such convergence are discussed. In addition, a brief overview of the future 6G architecture, and how it can converge with deep learning is also discussed.
2022-08-16 15:40:17 - Fair Machine Learning in Healthcare: A Review - *Qizhang Feng, Mengnan Du, Na Zou, Xia Hu* - `2206.14397v2` - [abs](http://arxiv.org/abs/2206.14397v2) - [pdf](http://arxiv.org/pdf/2206.14397v2) > Benefiting from the digitization of healthcare data and the development of computing power, machine learning methods are increasingly used in the healthcare domain. Fairness problems have been identified in machine learning for healthcare, resulting in an unfair allocation of limited healthcare resources or excessive health risks for certain groups. Therefore, addressing the fairness problems has recently attracted increasing attention from the healthcare community. However, the intersection of machine learning for healthcare and fairness in machine learning remains understudied. In this review, we build the bridge by exposing fairness problems, summarizing possible biases, sorting out mitigation methods and pointing out challenges along with opportunities for the future.
2022-08-16 16:40:01 - A Survey of Ad Hoc Teamwork Research - *Reuth Mirsky, Ignacio Carlucho, Arrasy Rahman, Elliot Fosong, William Macke, Mohan Sridharan, Peter Stone, Stefano V. Albrecht* - `2202.10450v3` - [abs](http://arxiv.org/abs/2202.10450v3) - [pdf](http://arxiv.org/pdf/2202.10450v3) > Ad hoc teamwork is the research problem of designing agents that can collaborate with new teammates without prior coordination. This survey makes a two-fold contribution: First, it provides a structured description of the different facets of the ad hoc teamwork problem. Second, it discusses the progress that has been made in the field so far, and identifies the immediate and long-term open problems that need to be addressed in ad hoc teamwork.
2022-08-16 17:21:38 - Neuron-level Interpretation of Deep NLP Models: A Survey - *Hassan Sajjad, Nadir Durrani, Fahim Dalvi* - `2108.13138v2` - [abs](http://arxiv.org/abs/2108.13138v2) - [pdf](http://arxiv.org/pdf/2108.13138v2) > The proliferation of deep neural networks in various domains has seen an increased need for interpretability of these models. Preliminary work done along this line and papers that surveyed such, are focused on high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level of analyzing neurons within these models. In this paper, we survey the work done on neuron analysis including: i) methods to discover and understand neurons in a network, ii) evaluation methods, iii) major findings including cross architectural comparisons that neuron analysis has unraveled, iv) applications of neuron probing such as: controlling the model, domain adaptation etc., and v) a discussion on open issues and future research directions.
2022-08-16 17:36:09 - Do Invariances in Deep Neural Networks Align with Human Perception? - *Vedant Nanda, Ayan Majumdar, Camila Kolling, John P. Dickerson, Krishna P. Gummadi, Bradley C. Love, Adrian Weller* - `2111.14726v3` - [abs](http://arxiv.org/abs/2111.14726v3) - [pdf](http://arxiv.org/pdf/2111.14726v3) > An evaluation criterion for safe and trustworthy deep learning is how well the invariances captured by representations of deep neural networks (DNNs) are shared with humans. We identify challenges in measuring these invariances. Prior works used gradient-based methods to generate \textit{identically represented inputs} (IRIs), \ie, inputs which have identical representations (on a given layer) of a neural network, and thus capture invariances of a given network. One necessary criterion for a network's invariances to align with human perception is for its IRIs look `similar` to humans. Prior works, however, have mixed takeaways; some argue that later layers of DNNs do not learn human-like invariances (\cite{jenelle2019metamers}) yet others seem to indicate otherwise (\cite{mahendran2014understanding}). We argue that the loss function used to generate IRIs can heavily affect takeaways about invariances of the network and is the primary reason for these conflicting findings. We propose an \textit{adversarial} regularizer on the IRI generation loss that finds IRIs that make any model appear to have very little shared invariance with humans. Based on this evidence, we argue that there is scope for improving models to have human-like invariances, and further, to have meaningful comparisons between models one should use IRIs generated using the \textit{regularizer-free} loss. We then conduct an in-depth investigation of how different components (\eg~architectures, training losses, data augmentations) of the deep learning pipeline contribute to learning models that have good alignment with humans. We find that architectures with residual connections trained using a (self-supervised) contrastive loss with $\ell_p$ ball adversarial data augmentation tend to learn invariances that are most aligned with humans.
2022-08-17 02:43:23 - UniLayout: Taming Unified Sequence-to-Sequence Transformers for Graphic Layout Generation - *Zhaoyun Jiang, Huayu Deng, Zhongkai Wu, Jiaqi Guo, Shizhao Sun, Vuksan Mijovic, Zijiang Yang, Jian-Guang Lou, Dongmei Zhang* - `2208.08037v1` - [abs](http://arxiv.org/abs/2208.08037v1) - [pdf](http://arxiv.org/pdf/2208.08037v1) > To satisfy various user needs, different subtasks of graphic layout generation have been explored intensively in recent years. Existing studies usually propose task-specific methods with diverse input-output formats, dedicated model architectures, and different learning methods. However, those specialized approaches make the adaption to unseen subtasks difficult, hinder the knowledge sharing between different subtasks, and are contrary to the trend of devising general-purpose models. In this work, we propose UniLayout, which handles different subtasks for graphic layout generation in a unified manner. First, we uniformly represent diverse inputs and outputs of subtasks as the sequences of tokens. Then, based on the unified sequence format, we naturally leverage an identical encoder-decoder architecture with Transformers for different subtasks. Moreover, based on the above two kinds of unification, we further develop a single model that supports all subtasks concurrently. Experiments on two public datasets demonstrate that while simple, UniLayout significantly outperforms the previous task-specific methods.
2022-08-17 03:03:07 - PolyU-BPCoMa: A Dataset and Benchmark Towards Mobile Colorized Mapping Using a Backpack Multisensorial System - *Wenzhong Shi, Pengxin Chen, Muyang Wang, Sheng Bao, Haodong Xiang, Yue Yu, Daping Yang* - `2206.07468v2` - [abs](http://arxiv.org/abs/2206.07468v2) - [pdf](http://arxiv.org/pdf/2206.07468v2) > Constructing colorized point clouds from mobile laser scanning and images is a fundamental work in surveying and mapping. It is also an essential prerequisite for building digital twins for smart cities. However, existing public datasets are either in relatively small scales or lack accurate geometrical and color ground truth. This paper documents a multisensorial dataset named PolyU-BPCoMA which is distinctively positioned towards mobile colorized mapping. The dataset incorporates resources of 3D LiDAR, spherical imaging, GNSS and IMU on a backpack platform. Color checker boards are pasted in each surveyed area as targets and ground truth data are collected by an advanced terrestrial laser scanner (TLS). 3D geometrical and color information can be recovered in the colorized point clouds produced by the backpack system and the TLS, respectively. Accordingly, we provide an opportunity to benchmark the mapping and colorization accuracy simultaneously for a mobile multisensorial system. The dataset is approximately 800 GB in size covering both indoor and outdoor environments. The dataset and development kits are available at https://github.com/chenpengxin/PolyU-BPCoMa.git.
2022-08-17 09:02:09 - KAM -- a Kernel Attention Module for Emotion Classification with EEG Data - *Dongyang Kuang, Craig Michoski* - `2208.08161v1` - [abs](http://arxiv.org/abs/2208.08161v1) - [pdf](http://arxiv.org/pdf/2208.08161v1) > In this work, a kernel attention module is presented for the task of EEG-based emotion classification with neural networks. The proposed module utilizes a self-attention mechanism by performing a kernel trick, demanding significantly fewer trainable parameters and computations than standard attention modules. The design also provides a scalar for quantitatively examining the amount of attention assigned during deep feature refinement, hence help better interpret a trained model. Using EEGNet as the backbone model, extensive experiments are conducted on the SEED dataset to assess the module's performance on within-subject classification tasks compared to other SOTA attention modules. Requiring only one extra parameter, the inserted module is shown to boost the base model's mean prediction accuracy up to more than 1\% across 15 subjects. A key component of the method is the interpretability of solutions, which is addressed using several different techniques, and is included throughout as part of the dependency analysis.
2022-08-17 12:20:57 - On the Elements of Datasets for Cyber Physical Systems Security - *Ashraf Tantawy* - `2208.08255v1` - [abs](http://arxiv.org/abs/2208.08255v1) - [pdf](http://arxiv.org/pdf/2208.08255v1) > Datasets are essential to apply AI algorithms to Cyber Physical System (CPS) Security. Due to scarcity of real CPS datasets, researchers elected to generate their own datasets using either real or virtualized testbeds. However, unlike other AI domains, a CPS is a complex system with many interfaces that determine its behavior. A dataset that comprises merely a collection of sensor measurements and network traffic may not be sufficient to develop resilient AI defensive or offensive agents. In this paper, we study the \emph{elements} of CPS security datasets required to capture the system behavior and interactions, and propose a dataset architecture that has the potential to enhance the performance of AI algorithms in securing cyber physical systems. The framework includes dataset elements, attack representation, and required dataset features. We compare existing datasets to the proposed architecture to identify the current limitations and discuss the future of CPS dataset generation using testbeds.
2022-08-17 14:48:39 - Deep Optical Coding Design in Computational Imaging - *Henry Arguello, Jorge Bacca, Hasindu Kariyawasam, Edwin Vargas, Miguel Marquez, Ramith Hettiarachchi, Hans Garcia, Kithmini Herath, Udith Haputhanthri, Balpreet Singh Ahluwalia, Peter So, Dushan N. Wadduwage, Chamira U. S. Edussooriya* - `2207.00164v2` - [abs](http://arxiv.org/abs/2207.00164v2) - [pdf](http://arxiv.org/pdf/2207.00164v2) > Computational optical imaging (COI) systems leverage optical coding elements (CE) in their setups to encode a high-dimensional scene in a single or multiple snapshots and decode it by using computational algorithms. The performance of COI systems highly depends on the design of its main components: the CE pattern and the computational method used to perform a given task. Conventional approaches rely on random patterns or analytical designs to set the distribution of the CE. However, the available data and algorithm capabilities of deep neural networks (DNNs) have opened a new horizon in CE data-driven designs that jointly consider the optical encoder and computational decoder. Specifically, by modeling the COI measurements through a fully differentiable image formation model that considers the physics-based propagation of light and its interaction with the CEs, the parameters that define the CE and the computational decoder can be optimized in an end-to-end (E2E) manner. Moreover, by optimizing just CEs in the same framework, inference tasks can be performed from pure optics. This work surveys the recent advances on CE data-driven design and provides guidelines on how to parametrize different optical elements to include them in the E2E framework. Since the E2E framework can handle different inference applications by changing the loss function and the DNN, we present low-level tasks such as spectral imaging reconstruction or high-level tasks such as pose estimation with privacy preserving enhanced by using optimal task-based optical architectures. Finally, we illustrate classification and 3D object recognition applications performed at the speed of the light using all-optics DNN.
2022-08-18 01:56:59 - AI in Human-computer Gaming: Techniques, Challenges and Opportunities - *Qiyue Yin, Jun Yang, Kaiqi Huang, Meijing Zhao, Wancheng Ni, Bin Liang, Yan Huang, Shu Wu, Liang Wang* - `2111.07631v2` - [abs](http://arxiv.org/abs/2111.07631v2) - [pdf](http://arxiv.org/pdf/2111.07631v2) > With breakthrough of the AlphaGo, human-computer gaming AI has ushered in a big explosion, attracting more and more researchers all around the world. As a recognized standard for testing artificial intelligence, various human-computer gaming AI systems (AIs) have been developed such as the Libratus, OpenAI Five and AlphaStar, beating professional human players. The rapid development of human-computer gaming AIs indicate a big step of decision making intelligence, and it seems that current techniques can handle very complex human-computer games. So, one natural question raises: what are the possible challenges of current techniques in human-computer gaming, and what are the future trends? To answer the above question, in this paper, we survey recent successful game AIs, covering board game AIs, card game AIs, first-person shooting game AIs and real time strategy game AIs. Through this survey, we 1) compare the main difficulties among different kinds of games and the corresponding techniques utilized for achieving professional human level AIs; 2) summarize the mainstream frameworks and techniques that can be properly relied on for developing AIs for complex human-computer gaming; 3) raise the challenges or drawbacks of current techniques in the successful AIs; and 4) try to point out future trends in human-computer gaming AIs. Finally, we hope this brief review can provide an introduction for beginners, and inspire insights for researchers in the field of AI in human-computer gaming.
2022-08-18 06:51:23 - Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance - *Bahjat Kawar, Roy Ganz, Michael Elad* - `2208.08664v1` - [abs](http://arxiv.org/abs/2208.08664v1) - [pdf](http://arxiv.org/pdf/2208.08664v1) > Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.
2022-08-18 07:17:40 - Tree species classification from hyperspectral data using graph-regularized neural networks - *Debmita Bandyopadhyay, Subhadip Mukherjee* - `2208.08675v1` - [abs](http://arxiv.org/abs/2208.08675v1) - [pdf](http://arxiv.org/pdf/2208.08675v1) > Manual labeling of tree species remains a challenging task, especially in tropical regions, owing to inaccessibility and labor-intensive ground-based surveys. Hyperspectral images (HSIs), through their narrow and contiguous bands, can assist in distinguishing tree species based on their spectral properties. Therefore, automated classification algorithms on HSI images can help augment the limited labeled information and generate a real-time classification map for various tree species. Achieving high classification accuracy with a limited amount of labeled information in an image is one of the key challenges that researchers have started addressing in recent years. We propose a novel graph-regularized neural network (GRNN) algorithm that encompasses the superpixel-based segmentation for graph construction, a pixel-wise neural network classifier, and the label propagation technique to generate an accurate classification map. GRNN outperforms several state-of-the-art techniques not only for the standard Indian Pines HSI but also achieves a high classification accuracy (approx. 92%) on a new HSI data set collected over the forests of French Guiana (FG) even when less than 1% of the pixels are labeled. We show that GRNN is not only competitive with the state-of-the-art semi-supervised methods, but also exhibits lower variance in accuracy for different number of training samples and over different independent random sampling of the labeled pixels for training.
2022-08-18 07:41:02 - T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency and Manifold Mix-Up - *Lin Wu, Yang Wang, Feng Zheng, Qi Tian, Meng Wang* - `2208.12752v1` - [abs](http://arxiv.org/abs/2208.12752v1) - [pdf](http://arxiv.org/pdf/2208.12752v1) > In this paper, we present an end-to-end approach to generate high-resolution person images conditioned on texts only. State-of-the-art text-to-image generation models are mainly designed for center-object generation, e.g., flowers and birds. Unlike center-placed objects with similar shapes and orientation, person image generation is a more challenging task, for which we observe the followings: 1) the generated images for the same person exhibit visual details with identity-consistency, e.g., identity-related textures/clothes/shoes across the images, and 2) those images should be discriminant for being robust against the inter-person variations caused by visual ambiguities. To address the above challenges, we develop an effective generative model to produce person images with two novel mechanisms. In particular, our first mechanism (called T-Person-GAN-ID) is to integrate the one-stream generator with an identity-preserving network such that the representations of generated data are regularized in their feature space to ensure the identity-consistency. The second mechanism (called T-Person-GAN-ID-MM) is based on the manifold mix-up to produce mixed images via the linear interpolation across generated images from different manifold identities, and we further enforce such interpolated images to be linearly classified in the feature space. This amounts to learning a linear classification boundary that can perfectly separate images from two identities. Our proposed method is empirically validated to achieve a remarkable improvement in text-to-person image generation. Our architecture is orthogonal to StackGAN++ , and focuses on person image generation, with all of them together to enrich the spectrum of GANs for the image generation task. Codes are available on \url{https://github.com/linwu-github/Person-Image-Generation.git}.
2022-08-18 08:03:45 - Open Information Extraction from 2007 to 2022 -- A Survey - *Pai Liu, Wenyang Gao, Wenjie Dong, Songfang Huang, Yue Zhang* - `2208.08690v1` - [abs](http://arxiv.org/abs/2208.08690v1) - [pdf](http://arxiv.org/pdf/2208.08690v1) > Open information extraction is an important NLP task that targets extracting structured information from unstructured text without limitations on the relation type or the domain of the text. This survey paper covers open information extraction technologies from 2007 to 2022 with a focus on new models not covered by previous surveys. We propose a new categorization method from the source of information perspective to accommodate the development of recent OIE technologies. In addition, we summarize three major approaches based on task settings as well as current popular datasets and model evaluation metrics. Given the comprehensive review, several future directions are shown from datasets, source of information, output form, method, and evaluation metric aspects.
2022-08-18 08:16:26 - Causal Reasoning Meets Visual Representation Learning: A Prospective Study - *Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin* - `2204.12037v7` - [abs](http://arxiv.org/abs/2204.12037v7) - [pdf](http://arxiv.org/pdf/2204.12037v7) > Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.
2022-08-18 09:28:03 - Intelligent problem-solving as integrated hierarchical reinforcement learning - *Manfred Eppe, Christian Gumbsch, Matthias Kerzel, Phuong D. H. Nguyen, Martin V. Butz, Stefan Wermter* - `2208.08731v1` - [abs](http://arxiv.org/abs/2208.08731v1) - [pdf](http://arxiv.org/pdf/2208.08731v1) > According to cognitive psychology and related disciplines, the development of complex problem-solving behaviour in biological agents depends on hierarchical cognitive mechanisms. Hierarchical reinforcement learning is a promising computational approach that may eventually yield comparable problem-solving behaviour in artificial agents and robots. However, to date the problem-solving abilities of many human and non-human animals are clearly superior to those of artificial systems. Here, we propose steps to integrate biologically inspired hierarchical mechanisms to enable advanced problem-solving skills in artificial agents. Therefore, we first review the literature in cognitive psychology to highlight the importance of compositional abstraction and predictive processing. Then we relate the gained insights with contemporary hierarchical reinforcement learning methods. Interestingly, our results suggest that all identified cognitive mechanisms have been implemented individually in isolated computational architectures, raising the question of why there exists no single unifying architecture that integrates them. As our final contribution, we address this question by providing an integrative perspective on the computational challenges to develop such a unifying architecture. We expect our results to guide the development of more sophisticated cognitively inspired hierarchical machine learning architectures.
2022-08-18 09:45:25 - Hierarchical principles of embodied reinforcement learning: A review - *Manfred Eppe, Christian Gumbsch, Matthias Kerzel, Phuong D. H. Nguyen, Martin V. Butz, Stefan Wermter* - `2012.10147v2` - [abs](http://arxiv.org/abs/2012.10147v2) - [pdf](http://arxiv.org/pdf/2012.10147v2) > Cognitive Psychology and related disciplines have identified several critical mechanisms that enable intelligent biological agents to learn to solve complex problems. There exists pressing evidence that the cognitive mechanisms that enable problem-solving skills in these species build on hierarchical mental representations. Among the most promising computational approaches to provide comparable learning-based problem-solving abilities for artificial agents and robots is hierarchical reinforcement learning. However, so far the existing computational approaches have not been able to equip artificial agents with problem-solving abilities that are comparable to intelligent animals, including human and non-human primates, crows, or octopuses. Here, we first survey the literature in Cognitive Psychology, and related disciplines, and find that many important mental mechanisms involve compositional abstraction, curiosity, and forward models. We then relate these insights with contemporary hierarchical reinforcement learning methods, and identify the key machine intelligence approaches that realise these mechanisms. As our main result, we show that all important cognitive mechanisms have been implemented independently in isolated computational architectures, and there is simply a lack of approaches that integrate them appropriately. We expect our results to guide the development of more sophisticated cognitively inspired hierarchical methods, so that future artificial agents achieve a problem-solving performance on the level of intelligent animals.
2022-08-18 11:06:31 - Wind Turbine Blade Surface Damage Detection based on Aerial Imagery and VGG16-RCNN Framework - *Juhi Patel, Lagan Sharma, Harsh S. Dhiman* - `2108.08636v2` - [abs](http://arxiv.org/abs/2108.08636v2) - [pdf](http://arxiv.org/pdf/2108.08636v2) > In this manuscript, an image analytics based deep learning framework for wind turbine blade surface damage detection is proposed. Turbine blade(s) which carry approximately one-third of a turbine weight are susceptible to damage and can cause sudden malfunction of a grid-connected wind energy conversion system. The surface damage detection of wind turbine blade requires a large dataset so as to detect a type of damage at an early stage. Turbine blade images are captured via aerial imagery. Upon inspection, it is found that the image dataset was limited and hence image augmentation is applied to improve blade image dataset. The approach is modeled as a multi-class supervised learning problem and deep learning methods like Convolutional neural network (CNN), VGG16-RCNN and AlexNet are tested for determining the potential capability of turbine blade surface damage.
2022-08-18 11:32:04 - Efficient data-driven gap filling of satellite image time series using deep neural networks with partial convolutions - *Marius Appel* - `2208.08781v1` - [abs](http://arxiv.org/abs/2208.08781v1) - [pdf](http://arxiv.org/pdf/2208.08781v1) > The abundance of gaps in satellite image time series often complicates the application of deep learning models such as convolutional neural networks for spatiotemporal modeling. Based on previous work in computer vision on image inpainting, this paper shows how three-dimensional spatiotemporal partial convolutions can be used as layers in neural networks to fill gaps in satellite image time series. To evaluate the approach, we apply a U-Net-like model on incomplete image time series of quasi-global carbon monoxide observations from the Sentinel-5P satellite. Prediction errors were comparable to two considered statistical approaches while computation times for predictions were up to three orders of magnitude faster, making the approach applicable to process large amounts of satellite data. Partial convolutions can be added as layers to other types of neural networks, making it relatively easy to integrate with existing deep learning models. However, the approach does not quantify prediction errors and further research is needed to understand and improve model transferability. The implementation of spatiotemporal partial convolutions and the U-Net-like model is available as open-source software.
2022-08-18 14:44:40 - Lessons from a Space Lab -- An Image Acquisition Perspective - *Leo Pauly, Michele Lynn Jamrozik, Miguel Ortiz Del Castillo, Olivia Borgue, Inder Pal Singh, Mohatashem Reyaz Makhdoomi, Olga-Orsalia Christidi-Loumpasefski, Vincent Gaudilliere, Carol Martinez, Arunkumar Rathinam, Andreas Hein, Miguel Olivares Mendez, Djamila Aouada* - `2208.08865v1` - [abs](http://arxiv.org/abs/2208.08865v1) - [pdf](http://arxiv.org/pdf/2208.08865v1) > The use of Deep Learning (DL) algorithms has improved the performance of vision-based space applications in recent years. However, generating large amounts of annotated data for training these DL algorithms has proven challenging. While synthetically generated images can be used, the DL models trained on synthetic data are often susceptible to performance degradation, when tested in real-world environments. In this context, the Interdisciplinary Center of Security, Reliability and Trust (SnT) at the University of Luxembourg has developed the 'SnT Zero-G Lab', for training and validating vision-based space algorithms in conditions emulating real-world space environments. An important aspect of the SnT Zero-G Lab development was the equipment selection. From the lessons learned during the lab development, this article presents a systematic approach combining market survey and experimental analyses for equipment selection. In particular, the article focus on the image acquisition equipment in a space lab: background materials, cameras and illumination lamps. The results from the experiment analyses show that the market survey complimented by experimental analyses is required for effective equipment selection in a space lab development project.
2022-08-18 17:56:44 - EGCR: Explanation Generation for Conversational Recommendation - *Bingbing Wen, Xiaoning Bu, Chirag Shah* - `2208.08035v2` - [abs](http://arxiv.org/abs/2208.08035v2) - [pdf](http://arxiv.org/pdf/2208.08035v2) > Growing attention has been paid in Conversational Recommendation System (CRS), which works as a conversation-based and recommendation task-oriented tool to provide items of interest and explore user preference. However, existing work in CRS fails to explicitly show the reasoning logic to users and the whole CRS still remains a black box. Therefore we propose a novel end-to-end framework named Explanation Generation for Conversational Recommendation (EGCR) based on generating explanations for conversational agents to explain why they make the action. EGCR incorporates user reviews to enhance the item representation and increase the informativeness of the whole conversation. To the best of our knowledge, this is the first framework for explainable conversational recommendation on real-world datasets. Moreover, we evaluate EGCR on one benchmark conversational recommendation datasets and achieve better performance on both recommendation accuracy and conversation quality than other state-of-the art models. Finally, extensive experiments demonstrate that generated explanations are not only having high quality and explainability, but also making CRS more trustworthy. We will make our code available to contribute to the CRS community
2022-08-18 18:22:42 - A Kind Introduction to Lexical and Grammatical Aspect, with a Survey of Computational Approaches - *Annemarie Friedrich, Nianwen Xue, Alexis Palmer* - `2208.09012v1` - [abs](http://arxiv.org/abs/2208.09012v1) - [pdf](http://arxiv.org/pdf/2208.09012v1) > Aspectual meaning refers to how the internal temporal structure of situations is presented. This includes whether a situation is described as a state or as an event, whether the situation is finished or ongoing, and whether it is viewed as a whole or with a focus on a particular phase. This survey gives an overview of computational approaches to modeling lexical and grammatical aspect along with intuitive explanations of the necessary linguistic concepts and terminology. In particular, we describe the concepts of stativity, telicity, habituality, perfective and imperfective, as well as influential inventories of eventuality and situation types. We argue that because aspect is a crucial component of semantics, especially when it comes to reporting the temporal structure of situations in a precise way, future NLP approaches need to be able to handle and evaluate it systematically in order to achieve human-level language understanding.
2022-08-18 18:31:40 - Treeformer: Dense Gradient Trees for Efficient Attention Computation - *Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, Prateek Jain* - `2208.09015v1` - [abs](http://arxiv.org/abs/2208.09015v1) - [pdf](http://arxiv.org/pdf/2208.09015v1) > Standard inference and training with transformer based architectures scale quadratically with input sequence length. This is prohibitively large for a variety of applications especially in web-page translation, query-answering etc. Consequently, several approaches have been developed recently to speedup attention computation by enforcing different attention structures such as sparsity, low-rank, approximating attention using kernels. In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. Based on such hierarchical navigation, we design Treeformer which can use one of two efficient attention layers -- TF-Attention and TC-Attention. TF-Attention computes the attention in a fine-grained style, while TC-Attention is a coarse attention layer which also ensures that the gradients are "dense". To optimize such challenging discrete layers, we propose a two-level bootstrapped training method. Using extensive experiments on standard NLP benchmarks, especially for long-sequences, we demonstrate that our Treeformer architecture can be almost as accurate as baseline Transformer while using 30x lesser FLOPs in the attention layer. Compared to Linformer, the accuracy can be as much as 12% higher while using similar FLOPs in the attention layer.
2022-08-19 02:16:59 - A Survey on Unsupervised Visual Industrial Anomaly Detection Algorithms - *Yajie Cui, Zhaoxiang Liu, Shiguo Lian* - `2204.11161v3` - [abs](http://arxiv.org/abs/2204.11161v3) - [pdf](http://arxiv.org/pdf/2204.11161v3) > In line with the development of Industry 4.0, more and more attention is attracted to the field of surface defect detection. Improving efficiency as well as saving labor costs has steadily become a matter of great concern in industry field, where deep learning-based algorithms performs better than traditional vision inspection methods in recent years. While existing deep learning-based algorithms are biased towards supervised learning, which not only necessitates a huge amount of labeled data and a significant amount of labor, but it is also inefficient and has certain limitations. In contrast, recent research shows that unsupervised learning has great potential in tackling above disadvantages for visual industrial anomaly detection. In this survey, we summarize current challenges and provide a thorough overview of recently proposed unsupervised algorithms for visual industrial anomaly detection covering five categories, whose innovation points and frameworks are described in detail. Meanwhile, information on publicly available datasets containing surface image samples are provided. By comparing different classes of methods, the advantages and disadvantages of anomaly detection algorithms are summarized. It is expected to assist both the research community and industry in developing a broader and cross-domain perspective.
2022-08-19 07:32:34 - Synthetic Data in Human Analysis: A Survey - *Indu Joshi, Marcel Grimmer, Christian Rathgeb, Christoph Busch, Francois Bremond, Antitza Dantcheva* - `2208.09191v1` - [abs](http://arxiv.org/abs/2208.09191v1) - [pdf](http://arxiv.org/pdf/2208.09191v1) > Deep neural networks have become prevalent in human analysis, boosting the performance of applications, such as biometric recognition, action recognition, as well as person re-identification. However, the performance of such networks scales with the available training data. In human analysis, the demand for large-scale datasets poses a severe challenge, as data collection is tedious, time-expensive, costly and must comply with data protection laws. Current research investigates the generation of \textit{synthetic data} as an efficient and privacy-ensuring alternative to collecting real data in the field. This survey introduces the basic definitions and methodologies, essential when generating and employing synthetic data for human analysis. We conduct a survey that summarises current state-of-the-art methods and the main benefits of using synthetic data. We also provide an overview of publicly available synthetic datasets and generation models. Finally, we discuss limitations, as well as open research problems in this field. This survey is intended for researchers and practitioners in the field of human analysis.
2022-08-19 07:41:28 - Towards Informed Design and Validation Assistance in Computer Games Using Imitation Learning - *Alessandro Sestini, Joakim Bergdahl, Konrad Tollmar, Andrew D. Bagdanov, Linus Gisslén* - `2208.07811v2` - [abs](http://arxiv.org/abs/2208.07811v2) - [pdf](http://arxiv.org/pdf/2208.07811v2) > In games, as in and many other domains, design validation and testing is a huge challenge as systems are growing in size and manual testing is becoming infeasible. This paper proposes a new approach to automated game validation and testing. Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming, that designers can use to efficiently train game testing agents. We investigate the validity of our approach through a user study with industry experts. The survey results show that our method is indeed a valid approach to game validation and that data-driven programming would be a useful aid to reducing effort and increasing quality of modern playtesting. The survey also highlights several open challenges. With the help of the most recent literature, we analyze the identified challenges and propose future research directions suitable for supporting and maximizing the utility of our approach.
2022-08-19 08:03:50 - Transformers in Medical Image Analysis: A Review - *Kelei He, Chen Gan, Zhuoyuan Li, Islem Rekik, Zihao Yin, Wen Ji, Yang Gao, Qian Wang, Junfeng Zhang, Dinggang Shen* - `2202.12165v3` - [abs](http://arxiv.org/abs/2202.12165v3) - [pdf](http://arxiv.org/pdf/2202.12165v3) > Transformers have dominated the field of natural language processing, and recently impacted the computer vision area. In the field of medical image analysis, Transformers have also been successfully applied to full-stack clinical applications, including image synthesis/reconstruction, registration, segmentation, detection, and diagnosis. Our paper aims to promote awareness and application of Transformers in the field of medical image analysis. Specifically, we first overview the core concepts of the attention mechanism built into Transformers and other basic components. Second, we review various Transformer architectures tailored for medical image applications and discuss their limitations. Within this review, we investigate key challenges revolving around the use of Transformers in different learning paradigms, improving the model efficiency, and their coupling with other techniques. We hope this review can give a comprehensive picture of Transformers to the readers in the field of medical image analysis.
2022-08-19 11:29:58 - A Survey of Methods for Automated Algorithm Configuration - *Elias Schede, Jasmin Brandt, Alexander Tornede, Marcel Wever, Viktor Bengs, Eyke Hüllermeier, Kevin Tierney* - `2202.01651v2` - [abs](http://arxiv.org/abs/2202.01651v2) - [pdf](http://arxiv.org/pdf/2202.01651v2) > Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.
2022-08-19 13:17:57 - Causal Intervention Improves Implicit Sentiment Analysis - *Siyin Wang, Jie Zhou, Changzhi Sun, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang* - `2208.09329v1` - [abs](http://arxiv.org/abs/2208.09329v1) - [pdf](http://arxiv.org/pdf/2208.09329v1) > Despite having achieved great success for sentiment analysis, existing neural models struggle with implicit sentiment analysis. This may be due to the fact that they may latch onto spurious correlations ("shortcuts", e.g., focusing only on explicit sentiment words), resulting in undermining the effectiveness and robustness of the learned model. In this work, we propose a causal intervention model for Implicit Sentiment Analysis using Instrumental Variable (ISAIV). We first review sentiment analysis from a causal perspective and analyze the confounders existing in this task. Then, we introduce an instrumental variable to eliminate the confounding causal effects, thus extracting the pure causal effect between sentence and sentiment. We compare the proposed ISAIV model with several strong baselines on both the general implicit sentiment analysis and aspect-based implicit sentiment analysis tasks. The results indicate the great advantages of our model and the efficacy of implicit sentiment reasoning.
2022-08-19 15:49:47 - PrepNet: A Convolutional Auto-Encoder to Homogenize CT Scans for Cross-Dataset Medical Image Analysis - *Mohammadreza Amirian, Javier A. Montoya-Zegarra, Jonathan Gruss, Yves D. Stebler, Ahmet Selman Bozkir, Marco Calandri, Friedhelm Schwenker, Thilo Stadelmann* - `2208.09408v1` - [abs](http://arxiv.org/abs/2208.09408v1) - [pdf](http://arxiv.org/pdf/2208.09408v1) > With the spread of COVID-19 over the world, the need arose for fast and precise automatic triage mechanisms to decelerate the spread of the disease by reducing human efforts e.g. for image-based diagnosis. Although the literature has shown promising efforts in this direction, reported results do not consider the variability of CT scans acquired under varying circumstances, thus rendering resulting models unfit for use on data acquired using e.g. different scanner technologies. While COVID-19 diagnosis can now be done efficiently using PCR tests, this use case exemplifies the need for a methodology to overcome data variability issues in order to make medical image analysis models more widely applicable. In this paper, we explicitly address the variability issue using the example of COVID-19 diagnosis and propose a novel generative approach that aims at erasing the differences induced by e.g. the imaging technology while simultaneously introducing minimal changes to the CT scans through leveraging the idea of deep auto-encoders. The proposed prepossessing architecture (PrepNet) (i) is jointly trained on multiple CT scan datasets and (ii) is capable of extracting improved discriminative features for improved diagnosis. Experimental results on three public datasets (SARS-COVID-2, UCSD COVID-CT, MosMed) show that our model improves cross-dataset generalization by up to $11.84$ percentage points despite a minor drop in within dataset performance.
2022-08-19 18:13:27 - Topical: Learning Repository Embeddings from Source Code using Attention - *Agathe Lherondelle, Yash Satsangi, Fran Silavong, Shaltiel Eloul, Sean Moran* - `2208.09495v1` - [abs](http://arxiv.org/abs/2208.09495v1) - [pdf](http://arxiv.org/pdf/2208.09495v1) > Machine learning on source code (MLOnCode) promises to transform how software is delivered. By mining the context and relationship between software artefacts, MLOnCode augments the software developers capabilities with code auto-generation, code recommendation, code auto-tagging and other data-driven enhancements. For many of these tasks a script level representation of code is sufficient, however, in many cases a repository level representation that takes into account various dependencies and repository structure is imperative, for example, auto-tagging repositories with topics or auto-documentation of repository code etc. Existing methods for computing repository level representations suffer from (a) reliance on natural language documentation of code (for example, README files) (b) naive aggregation of method/script-level representation, for example, by concatenation or averaging. This paper introduces Topical a deep neural network to generate repository level embeddings of publicly available GitHub code repositories directly from source code. Topical incorporates an attention mechanism that projects the source code, the full dependency graph and the script level textual information into a dense repository-level representation. To compute the repository-level representations, Topical is trained to predict the topics associated with a repository, on a dataset of publicly available GitHub repositories that were crawled along with their ground truth topic tags. Our experiments show that the embeddings computed by Topical are able to outperform multiple baselines, including baselines that naively combine the method-level representations through averaging or concatenation at the task of repository auto-tagging.
2022-08-19 18:26:35 - Explainable Biometrics in the Age of Deep Learning - *Pedro C. Neto, Tiago Gonçalves, João Ribeiro Pinto, Wilson Silva, Ana F. Sequeira, Arun Ross, Jaime S. Cardoso* - `2208.09500v1` - [abs](http://arxiv.org/abs/2208.09500v1) - [pdf](http://arxiv.org/pdf/2208.09500v1) > Systems capable of analyzing and quantifying human physical or behavioral traits, known as biometrics systems, are growing in use and application variability. Since its evolution from handcrafted features and traditional machine learning to deep learning and automatic feature extraction, the performance of biometric systems increased to outstanding values. Nonetheless, the cost of this fast progression is still not understood. Due to its opacity, deep neural networks are difficult to understand and analyze, hence, hidden capacities or decisions motivated by the wrong motives are a potential risk. Researchers have started to pivot their focus towards the understanding of deep neural networks and the explanation of their predictions. In this paper, we provide a review of the current state of explainable biometrics based on the study of 47 papers and discuss comprehensively the direction in which this field should be developed.
2022-08-19 18:58:53 - Globus Automation Services: Research process automation across the space-time continuum - *Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, Ian Foster* - `2208.09513v1` - [abs](http://arxiv.org/abs/2208.09513v1) - [pdf](http://arxiv.org/pdf/2208.09513v1) > Research process automation--the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources--has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, flows, and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: 1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; 2) a declarative notation, and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flow specifications involving arbitrary resources; 3) authorization delegation mechanisms for secure invocation of actions. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe the design and implementation of the services, present microbenchmark studies, and review experiences applying the services in a range of applications
2022-08-20 02:15:44 - Learning in Audio-visual Context: A Review, Analysis, and New Perspective - *Yake Wei, Di Hu, Yapeng Tian, Xuelong Li* - `2208.09579v1` - [abs](http://arxiv.org/abs/2208.09579v1) - [pdf](http://arxiv.org/pdf/2208.09579v1) > Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To revisit the current development of the audio-visual learning field from a more macro view, we further propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area. Overall, this survey reviews and outlooks the current audio-visual learning field from different aspects. We hope it can provide researchers with a better understanding of this area. A website including constantly-updated survey is released: \url{https://gewu-lab.github.io/audio-visual-learning/}.
2022-08-20 03:21:44 - Review on Action Recognition for Accident Detection in Smart City Transportation Systems - *Victor Adewopo, Nelly Elsayed, Zag ElSayed, Murat Ozer, Ahmed Abdelgawad, Magdy Bayoumi* - `2208.09588v1` - [abs](http://arxiv.org/abs/2208.09588v1) - [pdf](http://arxiv.org/pdf/2208.09588v1) > Action detection and public traffic safety are crucial aspects of a safe community and a better society. Monitoring traffic flows in a smart city using different surveillance cameras can play a significant role in recognizing accidents and alerting first responders. The utilization of action recognition (AR) in computer vision tasks has contributed towards high-precision applications in video surveillance, medical imaging, and digital signal processing. This paper presents an intensive review focusing on action recognition in accident detection and autonomous transportation systems for a smart city. In this paper, we focused on AR systems that used diverse sources of traffic video capturing, such as static surveillance cameras on traffic intersections, highway monitoring cameras, drone cameras, and dash-cams. Through this review, we identified the primary techniques, taxonomies, and algorithms used in AR for autonomous transportation and accident detection. We also examined data sets utilized in the AR tasks, identifying the main sources of datasets and features of the datasets. This paper provides potential research direction to develop and integrate accident detection systems for autonomous cars and public traffic safety systems by alerting emergency personnel and law enforcement in the event of road accidents to minimize human error in accident reporting and provide a spontaneous response to victims
2022-08-20 03:25:58 - Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey - *Debo Cheng, Jiuyong Li, Lin Liu, Jixue Liu, Thuc Duy Le* - `2208.09590v1` - [abs](http://arxiv.org/abs/2208.09590v1) - [pdf](http://arxiv.org/pdf/2208.09590v1) > In many fields of scientific research and real-world applications, unbiased estimation of causal effects from non-experimental data is crucial for understanding the mechanism underlying the data and for decision-making on effective responses or interventions. A great deal of research has been conducted on this challenging problem from different angles. For causal effect estimation in data, assumptions such as Markov property, faithfulness and causal sufficiency are always made. Under the assumptions, full knowledge such as, a set of covariates or an underlying causal graph, is still required. A practical challenge is that in many applications, no such full knowledge or only some partial knowledge is available. In recent years, research has emerged to use a search strategy based on graphical causal modelling to discover useful knowledge from data for causal effect estimation, with some mild assumptions, and has shown promose in tackling the practical challenge. In this survey, we review the methods and focus on the challenges the data-driven methods face. We discuss the assumptions, strengths and limitations of the data-driven methods. We hope this review will motivate more researchers to design better data-driven methods based on graphical causal modelling for the challenging problem of causal effect estimation.
2022-08-21 07:02:19 - Memristive Computing for Efficient Inference on Resource Constrained Devices - *Venkatesh Rammamoorthy, Geng Zhao, Bharathi Reddy, Ming-Yang Lin* - `2208.10490v1` - [abs](http://arxiv.org/abs/2208.10490v1) - [pdf](http://arxiv.org/pdf/2208.10490v1) > The advent of deep learning has resulted in a number of applications which have transformed the landscape of the research area in which it has been applied. However, with an increase in popularity, the complexity of classical deep neural networks has increased over the years. As a result, this has leads to considerable problems during deployment on devices with space and time constraints. In this work, we perform a review of the present advancements in non-volatile memory and how the use of resistive RAM memory, particularly memristors, can help to progress the state of research in deep learning. In other words, we wish to present an ideology that advances in the field of memristive technology can greatly influence and impact deep learning inference on edge devices.
2022-08-21 17:31:31 - Deepfake: Definitions, Performance Metrics and Standards, Datasets and Benchmarks, and a Meta-Review - *Enes Altuncu, Virginia N. L. Franqueira, Shujun Li* - `2208.10913v1` - [abs](http://arxiv.org/abs/2208.10913v1) - [pdf](http://arxiv.org/pdf/2208.10913v1) > Recent advancements in AI, especially deep learning, have contributed to a significant increase in the creation of new realistic-looking synthetic media (video, image, and audio) and manipulation of existing media, which has led to the creation of the new term ``deepfake''. Based on both the research literature and resources in English and in Chinese, this paper gives a comprehensive overview of deepfake, covering multiple important aspects of this emerging concept, including 1) different definitions, 2) commonly used performance metrics and standards, and 3) deepfake-related datasets, challenges, competitions and benchmarks. In addition, the paper also reports a meta-review of 12 selected deepfake-related survey papers published in 2020 and 2021, focusing not only on the mentioned aspects, but also on the analysis of key challenges and recommendations. We believe that this paper is the most comprehensive review of deepfake in terms of aspects covered, and the first one covering both the English and Chinese literature and sources.
2022-08-21 19:32:07 - Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations - *Siddhant Arora, Danish Pruthi, Norman Sadeh, William W. Cohen, Zachary C. Lipton, Graham Neubig* - `2112.09669v2` - [abs](http://arxiv.org/abs/2112.09669v2) - [pdf](http://arxiv.org/pdf/2112.09669v2) > In attempts to "explain" predictions of machine learning models, researchers have proposed hundreds of techniques for attributing predictions to features that are deemed important. While these attributions are often claimed to hold the potential to improve human "understanding" of the models, surprisingly little work explicitly evaluates progress towards this aspiration. In this paper, we conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. They are challenged both to simulate the model on fresh reviews, and to edit reviews with the goal of lowering the probability of the originally predicted class. Successful manipulations would lead to an adversarial example. During the training (but not the test) phase, input spans are highlighted to communicate salience. Through our evaluation, we observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control. For the BERT-based classifier, popular local explanations do not improve their ability to reduce the model confidence over the no-explanation case. Remarkably, when the explanation for the BERT model is given by the (global) attributions of a linear model trained to imitate the BERT model, people can effectively manipulate the model.
2022-08-22 03:58:02 - Reference-Limited Compositional Zero-Shot Learning - *Siteng Huang, Qiyao Wei, Donglin Wang* - `2208.10046v1` - [abs](http://arxiv.org/abs/2208.10046v1) - [pdf](http://arxiv.org/pdf/2208.10046v1) > Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives, which is an essential ability for artificial intelligence systems to learn and understand the world. While considerable progress has been made on existing benchmarks, we suspect whether popular CZSL methods can address the challenges of few-shot and few referential compositions, which is common when learning in real-world unseen environments. To this end, we study the challenging reference-limited compositional zero-shot learning (RL-CZSL) problem in this paper, i.e. , given limited seen compositions that contain only a few samples as reference, unseen compositions of observed primitives should be identified. We propose a novel Meta Compositional Graph Learner (MetaCGL) that can efficiently learn the compositionality from insufficient referential information and generalize to unseen compositions. Besides, we build a benchmark with two new large-scale datasets that consist of natural images with diverse compositional labels, providing more realistic environments for RL-CZSL. Extensive experiments in the benchmarks show that our method achieves state-of-the-art performance in recognizing unseen compositions when reference is limited for compositional learning.
2022-08-22 06:18:18 - Isoform Function Prediction Using Deep Neural Network - *Sara Ghazanfari, Ali Rasteh, Seyed Abolfazl Motahari, Mahdieh Soleymani Baghshah* - `2208.03325v2` - [abs](http://arxiv.org/abs/2208.03325v2) - [pdf](http://arxiv.org/pdf/2208.03325v2) > Isoforms are mRNAs produced from the same gene site in the phenomenon called Alternative Splicing. Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing. Although there are few changes in mRNA sequence, They may have a systematic effect on cell function and regulation. It is widely reported that isoforms of a gene have distinct or even contrasting functions. Most studies have shown that alternative splicing plays a significant role in human health and disease. Despite the wide range of gene function studies, there is little information about isoforms' functionalities. Recently, some computational methods based on Multiple Instance Learning have been proposed to predict isoform function using gene function and gene expression profile. However, their performance is not desirable due to the lack of labeled training data. In addition, probabilistic models such as Conditional Random Field (CRF) have been used to model the relation between isoforms. This project uses all the data and valuable information such as isoform sequences, expression profiles, and gene ontology graphs and proposes a comprehensive model based on Deep Neural Networks. The UniProt Gene Ontology (GO) database is used as a standard reference for gene functions. The NCBI RefSeq database is used for extracting gene and isoform sequences, and the NCBI SRA database is used for expression profile data. Metrics such as Receiver Operating Characteristic Area Under the Curve (ROC AUC) and Precision-Recall Under the Curve (PR AUC) are used to measure the prediction accuracy.
2022-08-22 07:18:23 - Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect - *Naihao Deng, Yulong Chen, Yue Zhang* - `2208.10099v1` - [abs](http://arxiv.org/abs/2208.10099v1) - [pdf](http://arxiv.org/pdf/2208.10099v1) > Text-to-SQL has attracted attention from both the natural language processing and database communities because of its ability to convert the semantics in natural language into SQL queries and its practical application in building natural language interfaces to database systems. The major challenges in text-to-SQL lie in encoding the meaning of natural utterances, decoding to SQL queries, and translating the semantics between these two forms. These challenges have been addressed to different extents by the recent advances. However, there is still a lack of comprehensive surveys for this task. To this end, we review recent progress on text-to-SQL for datasets, methods, and evaluation and provide this systematic survey, addressing the aforementioned challenges and discussing potential future directions. We hope that this survey can serve as quick access to existing work and motivate future research.
2022-08-22 12:00:48 - Towards Label-efficient Automatic Diagnosis and Analysis: A Comprehensive Survey of Advanced Deep Learning-based Weakly-supervised, Semi-supervised and Self-supervised Techniques in Histopathological Image Analysis - *Linhao Qu, Siyu Liu, Xiaoyu Liu, Manning Wang, Zhijian Song* - `2208.08789v2` - [abs](http://arxiv.org/abs/2208.08789v2) - [pdf](http://arxiv.org/pdf/2208.08789v2) > Histopathological images contain abundant phenotypic information and pathological patterns, which are the gold standards for disease diagnosis and essential for the prediction of patient prognosis and treatment outcome. In recent years, computer-automated analysis techniques for histopathological images have been urgently required in clinical practice, and deep learning methods represented by convolutional neural networks have gradually become the mainstream in the field of digital pathology. However, obtaining large numbers of fine-grained annotated data in this field is a very expensive and difficult task, which hinders the further development of traditional supervised algorithms based on large numbers of annotated data. More recent studies have started to liberate from the traditional supervised paradigm, and the most representative ones are the studies on weakly supervised learning paradigm based on weak annotation, semi-supervised learning paradigm based on limited annotation, and self-supervised learning paradigm based on pathological image representation learning. These new methods have led a new wave of automatic pathological image diagnosis and analysis targeted at annotation efficiency. With a survey of over 130 papers, we present a comprehensive and systematic review of the latest studies on weakly supervised learning, semi-supervised learning, and self-supervised learning in the field of computational pathology from both technical and methodological perspectives. Finally, we present the key challenges and future trends for these techniques.
2022-08-22 12:10:27 - Survey of NLP in Pharmacology: Methodology, Tasks, Resources, Knowledge, and Tools - *Dimitar Trajanov, Vangel Trajkovski, Makedonka Dimitrieva, Jovana Dobreva, Milos Jovanovik, Matej Klemen, Aleš Žagar, Marko Robnik-Šikonja* - `2208.10228v1` - [abs](http://arxiv.org/abs/2208.10228v1) - [pdf](http://arxiv.org/pdf/2208.10228v1) > Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the last few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers.
2022-08-22 12:12:43 - Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey - *Dalin Zhang, Kaixuan Chen, Yan Zhao, Bin Yang, Lina Yao, Christian S. Jensen* - `2208.10498v1` - [abs](http://arxiv.org/abs/2208.10498v1) - [pdf](http://arxiv.org/pdf/2208.10498v1) > Deep learning technologies have demonstrated remarkable effectiveness in a wide range of tasks, and deep learning holds the potential to advance a multitude of applications, including in edge computing, where deep models are deployed on edge devices to enable instant data processing and response. A key challenge is that while the application of deep models often incurs substantial memory and computational costs, edge devices typically offer only very limited storage and computational capabilities that may vary substantially across devices. These characteristics make it difficult to build deep learning solutions that unleash the potential of edge devices while complying with their constraints. A promising approach to addressing this challenge is to automate the design of effective deep learning models that are lightweight, require only a little storage, and incur only low computational overheads. This survey offers comprehensive coverage of studies of design automation techniques for deep learning models targeting edge computing. It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs. The survey then proceeds to cover three categories of the state-of-the-art of deep model design automation techniques: automated neural architecture search, automated model compression, and joint automated design and compression. Finally, the survey covers open issues and directions for future research.
2022-08-22 14:27:48 - Colloquium: Advances in automation of quantum dot devices control - *Justyna P. Zwolak, Jacob M. Taylor* - `2112.09362v2` - [abs](http://arxiv.org/abs/2112.09362v2) - [pdf](http://arxiv.org/pdf/2112.09362v2) > Arrays of quantum dots (QDs) are a promising candidate system to realize scalable, coupled qubits systems and serve as a fundamental building block for quantum computers. In such semiconductor quantum systems, devices now have tens of individual electrostatic and dynamical voltages that must be carefully set to localize the system into the single-electron regime and to realize good qubit operational performance. The mapping of requisite QD locations and charges to gate voltages presents a challenging classical control problem. With an increasing number of QD qubits, the relevant parameter space grows sufficiently to make heuristic control unfeasible.In recent years, there has been a considerable effort to automate device control that combines script-based algorithms with machine learning (ML) techniques. In this colloquium, we present a comprehensive overview of the recent progress in the automation of QD device control, with a particular emphasis on silicon- and GaAs-based QDs formed in two-dimensional electron gases. Combining physics-based modeling with modern numerical optimization and ML has proven quite effective in yielding efficient, scalable control. Further integration of theoretical, computational, and experimental efforts with computer science and ML holds tremendous potential in advancing semiconductor and other platforms for quantum computing.
2022-08-22 14:51:29 - Collaborative Perception for Autonomous Driving: Current Status and Future Trend - *Shunli Ren, Siheng Chen, Wenjun Zhang* - `2208.10371v1` - [abs](http://arxiv.org/abs/2208.10371v1) - [pdf](http://arxiv.org/pdf/2208.10371v1) > Perception is one of the crucial module of the autonomous driving system, which has made great progress recently. However, limited ability of individual vehicles results in the bottleneck of improvement of the perception performance. To break through the limits of individual perception, collaborative perception has been proposed which enables vehicles to share information to perceive the environments beyond line-of-sight and field-of-view. In this paper, we provide a review of the related work about the promising collaborative perception technology, including introducing the fundamental concepts, generalizing the collaboration modes and summarizing the key ingredients and applications of collaborative perception. Finally, we discuss the open challenges and issues of this research area and give some potential further directions.
2022-08-22 15:48:35 - On Deep Learning in Password Guessing, a Survey - *Fangyi Yu* - `2208.10413v1` - [abs](http://arxiv.org/abs/2208.10413v1) - [pdf](http://arxiv.org/pdf/2208.10413v1) > The security of passwords is dependent on a thorough understanding of the strategies used by attackers. Unfortunately, real-world adversaries use pragmatic guessing tactics like dictionary attacks, which are difficult to simulate in password security research. Dictionary attacks must be carefully configured and modified to be representative of the actual threat. This approach, however, needs domain-specific knowledge and expertise that are difficult to duplicate. This paper compares various deep learning-based password guessing approaches that do not require domain knowledge or assumptions about users' password structures and combinations. The involved model categories are Recurrent Neural Networks, Generative Adversarial Networks, Autoencoder, and Attention mechanisms. Additionally, we proposed a promising research experimental design on using variations of IWGAN on password guessing under non-targeted offline attacks. Using these advanced strategies, we can enhance password security and create more accurate and efficient Password Strength Meters.
2022-08-22 18:26:43 - Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics - *Jun Jet Tai, Jordan K. Terry, Mauro S. Innocente, James Brusey, Nadjim Horri* - `2208.10533v1` - [abs](http://arxiv.org/abs/2208.10533v1) - [pdf](http://arxiv.org/pdf/2208.10533v1) > An inherent problem in reinforcement learning is coping with policies that are uncertain about what action to take (or the value of a state). Model uncertainty, more formally known as epistemic uncertainty, refers to the expected prediction error of a model beyond the sampling noise. In this paper, we propose a metric for epistemic uncertainty estimation in Q-value functions, which we term pathwise epistemic uncertainty. We further develop a method to compute its approximate upper bound, which we call F -value. We experimentally apply the latter to Deep Q-Networks (DQN) and show that uncertainty estimation in reinforcement learning serves as a useful indication of learning progress. We then propose a new approach to improving sample efficiency in actor-critic algorithms by learning from an existing (previously learned or hard-coded) oracle policy while uncertainty is high, aiming to avoid unproductive random actions during training. We term this Critic Confidence Guided Exploration (CCGE). We implement CCGE on Soft Actor-Critic (SAC) using our F-value metric, which we apply to a handful of popular Gym environments and show that it achieves better sample efficiency and total episodic reward than vanilla SAC in limited contexts.
2022-08-22 20:47:35 - A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation - *Zhengwei Bai, Guoyuan Wu, Matthew J. Barth, Yongkang Liu, Emrah Akin Sisbot, Kentaro Oguchi, Zhitong Huang* - `2208.10590v1` - [abs](http://arxiv.org/abs/2208.10590v1) - [pdf](http://arxiv.org/pdf/2208.10590v1) > Perceiving the environment is one of the most fundamental keys to enabling Cooperative Driving Automation (CDA), which is regarded as the revolutionary solution to addressing the safety, mobility, and sustainability issues of contemporary transportation systems. Although an unprecedented evolution is now happening in the area of computer vision for object perception, state-of-the-art perception methods are still struggling with sophisticated real-world traffic environments due to the inevitably physical occlusion and limited receptive field of single-vehicle systems. Based on multiple spatially separated perception nodes, Cooperative Perception (CP) is born to unlock the bottleneck of perception for driving automation. In this paper, we comprehensively review and analyze the research progress on CP and, to the best of our knowledge, this is the first time to propose a unified CP framework. Architectures and taxonomy of CP systems based on different types of sensors are reviewed to show a high-level description of the workflow and different structures for CP systems. Node structure, sensor modality, and fusion schemes are reviewed and analyzed with comprehensive literature to provide detailed explanations of specific methods. A Hierarchical CP framework is proposed, followed by a review of existing Datasets and Simulators to sketch an overall landscape of CP. Discussion highlights the current opportunities, open challenges, and anticipated future trends.
2022-08-22 22:36:57 - "Are you okay, honey?": Recognizing Emotions among Couples Managing Diabetes in Daily Life using Multimodal Real-World Smartwatch Data - *George Boateng, Xiangyu Zhao, Malgorzata Speichert, Elgar Fleisch, Janina Lüscher, Theresa Pauly, Urte Scholz, Guy Bodenmann, Tobias Kowatsch* - `2208.08909v2` - [abs](http://arxiv.org/abs/2208.08909v2) - [pdf](http://arxiv.org/pdf/2208.08909v2) > Couples generally manage chronic diseases together and the management takes an emotional toll on both patients and their romantic partners. Consequently, recognizing the emotions of each partner in daily life could provide an insight into their emotional well-being in chronic disease management. Currently, the process of assessing each partner's emotions is manual, time-intensive, and costly. Despite the existence of works on emotion recognition among couples, none of these works have used data collected from couples' interactions in daily life. In this work, we collected 85 hours (1,021 5-minute samples) of real-world multimodal smartwatch sensor data (speech, heart rate, accelerometer, and gyroscope) and self-reported emotion data (n=612) from 26 partners (13 couples) managing diabetes mellitus type 2 in daily life. We extracted physiological, movement, acoustic, and linguistic features, and trained machine learning models (support vector machine and random forest) to recognize each partner's self-reported emotions (valence and arousal). Our results from the best models (balanced accuracies of 63.8% and 78.1% for arousal and valence respectively) are better than chance and our prior work that also used data from German-speaking, Swiss-based couples, albeit, in the lab. This work contributes toward building automated emotion recognition systems that would eventually enable partners to monitor their emotions in daily life and enable the delivery of interventions to improve their emotional well-being.
2022-08-23 00:39:40 - Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work - *Khawar Islam* - `2203.01536v4` - [abs](http://arxiv.org/abs/2203.01536v4) - [pdf](http://arxiv.org/pdf/2203.01536v4) > Vision Transformers (ViTs) are becoming more popular and dominating technique for various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding technique in computer vision, ViTs have been successfully solved various vision problems while focusing on long-range relationships. In this paper, we begin by introducing the fundamental concepts and background of the self-attention mechanism. Next, we provide a comprehensive overview of recent top-performing ViT methods describing in terms of strength and weakness, computational cost as well as training and testing dataset. We thoroughly compare the performance of various ViT algorithms and most representative CNN methods on popular benchmark datasets. Finally, we explore some limitations with insightful observations and provide further research direction. The project page along with the collections of papers are available at https://github.com/khawar512/ViT-Survey
2022-08-23 01:06:51 - Knowledge Extraction in Low-Resource Scenarios: Survey and Perspective - *Shumin Deng, Ningyu Zhang, Feiyu Xiong, Jeff Z. Pan, Huajun Chen* - `2202.08063v2` - [abs](http://arxiv.org/abs/2202.08063v2) - [pdf](http://arxiv.org/pdf/2202.08063v2) > Knowledge Extraction (KE) which aims to extract structural information from unstructured texts often suffers from data scarcity and emerging unseen types, i.e., low-resource scenarios. Many neural approaches on low-resource KE have been widely investigated and achieved impressive performance. In this paper, we present a literature review towards KE in low-resource scenarios, and systematically categorize existing works into three paradigms: (1) exploiting higher-resource data, (2) exploiting stronger models, and (3) exploiting data and models together. In addition, we describe promising applications and outline some potential directions for future research. We hope that our survey can help both the academic and industrial community to better understand this field, inspire more ideas and boost broader applications.
2022-08-23 02:28:45 - Measuring Uncertainty in Signal Fingerprinting with Gaussian Processes Going Deep - *Ran Guan, Andi Zhang, Mengchao Li, Yongliang Wang* - `2109.04360v5` - [abs](http://arxiv.org/abs/2109.04360v5) - [pdf](http://arxiv.org/pdf/2109.04360v5) > In indoor positioning, signal fluctuation is highly location-dependent. However, signal uncertainty is one critical yet commonly overlooked dimension of the radio signal to be fingerprinted. This paper reviews the commonly used Gaussian Processes (GP) for probabilistic positioning and points out the pitfall of using GP to model signal fingerprint uncertainty. This paper also proposes Deep Gaussian Processes (DGP) as a more informative alternative to address the issue. How DGP better measures uncertainty in signal fingerprinting is evaluated via simulated and realistically collected datasets.
2022-08-23 08:38:52 - A Survey of Self-Supervised and Few-Shot Object Detection - *Gabriel Huang, Issam Laradji, David Vazquez, Simon Lacoste-Julien, Pau Rodriguez* - `2110.14711v3` - [abs](http://arxiv.org/abs/2110.14711v3) - [pdf](http://arxiv.org/pdf/2110.14711v3) > Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions. Project page at https://gabrielhuang.github.io/fsod-survey/
2022-08-23 12:01:16 - A Comprehensive Study of Real-Time Object Detection Networks Across Multiple Domains: A Survey - *Elahe Arani, Shruthi Gowda, Ratnajit Mukherjee, Omar Magdy, Senthilkumar Kathiresan, Bahram Zonooz* - `2208.10895v1` - [abs](http://arxiv.org/abs/2208.10895v1) - [pdf](http://arxiv.org/pdf/2208.10895v1) > Deep neural network based object detectors are continuously evolving and are used in a multitude of applications, each having its own set of requirements. While safety-critical applications need high accuracy and reliability, low-latency tasks need resource and energy-efficient networks. Real-time detectors, which are a necessity in high-impact real-world applications, are continuously proposed, but they overemphasize the improvements in accuracy and speed while other capabilities such as versatility, robustness, resource and energy efficiency are omitted. A reference benchmark for existing networks does not exist, nor does a standard evaluation guideline for designing new networks, which results in ambiguous and inconsistent comparisons. We, thus, conduct a comprehensive study on multiple real-time detectors (anchor-, keypoint-, and transformer-based) on a wide range of datasets and report results on an extensive set of metrics. We also study the impact of variables such as image size, anchor dimensions, confidence thresholds, and architecture layers on the overall performance. We analyze the robustness of detection networks against distribution shifts, natural corruptions, and adversarial attacks. Also, we provide a calibration analysis to gauge the reliability of the predictions. Finally, to highlight the real-world impact, we conduct two unique case studies, on autonomous driving and healthcare applications. To further gauge the capability of networks in critical real-time applications, we report the performance after deploying the detection networks on edge devices. Our extensive empirical study can act as a guideline for the industrial community to make an informed choice on the existing networks. We also hope to inspire the research community towards a new direction in the design and evaluation of networks that focuses on a bigger and holistic overview for a far-reaching impact.
2022-08-23 12:48:53 - AI and 6G into the Metaverse: Fundamentals, Challenges and Future Research Trends - *Muhammad Zawish, Fayaz Ali Dharejo, Sunder Ali Khowaja, Kapal Dev, Steven Davy, Nawab Muhammad Faseeh Qureshi, Paolo Bellavista* - `2208.10921v1` - [abs](http://arxiv.org/abs/2208.10921v1) - [pdf](http://arxiv.org/pdf/2208.10921v1) > Since Facebook was renamed Meta, a lot of attention, debate, and exploration have intensified about what the Metaverse is, how it works, and the possible ways to exploit it. It is anticipated that Metaverse will be a continuum of rapidly emerging technologies, usecases, capabilities, and experiences that will make it up for the next evolution of the Internet. Several researchers have already surveyed the literature on artificial intelligence (AI) and wireless communications in realizing the Metaverse. However, due to the rapid emergence of technologies, there is a need for a comprehensive and in-depth review of the role of AI, 6G, and the nexus of both in realizing the immersive experiences of Metaverse. Therefore, in this survey, we first introduce the background and ongoing progress in augmented reality (AR), virtual reality (VR), mixed reality (MR) and spatial computing, followed by the technical aspects of AI and 6G. Then, we survey the role of AI in the Metaverse by reviewing the state-of-the-art in deep learning, computer vision, and edge AI. Next, we investigate the promising services of B5G/6G towards Metaverse, followed by identifying the role of AI in 6G networks and 6G networks for AI in support of Metaverse applications. Finally, we enlist the existing and potential applications, usecases, and projects to highlight the importance of progress in the Metaverse. Moreover, in order to provide potential research directions to researchers, we enlist the challenges, research gaps, and lessons learned identified from the literature review of the aforementioned technologies.
2022-08-23 14:05:30 - ULISSE: A Tool for One-shot Sky Exploration and its Application to Active Galactic Nuclei Detection - *Lars Doorenbos, Olena Torbaniuk, Stefano Cavuoti, Maurizio Paolillo, Giuseppe Longo, Massimo Brescia, Raphael Sznitman, Pablo Márquez-Neila* - `2208.10984v1` - [abs](http://arxiv.org/abs/2208.10984v1) - [pdf](http://arxiv.org/pdf/2208.10984v1) > Modern sky surveys are producing ever larger amounts of observational data, which makes the application of classical approaches for the classification and analysis of objects challenging and time-consuming. However, this issue may be significantly mitigated by the application of automatic machine and deep learning methods. We propose ULISSE, a new deep learning tool that, starting from a single prototype object, is capable of identifying objects sharing the same morphological and photometric properties, and hence of creating a list of candidate sosia. In this work, we focus on applying our method to the detection of AGN candidates in a Sloan Digital Sky Survey galaxy sample, since the identification and classification of Active Galactic Nuclei (AGN) in the optical band still remains a challenging task in extragalactic astronomy. Intended for the initial exploration of large sky surveys, ULISSE directly uses features extracted from the ImageNet dataset to perform a similarity search. The method is capable of rapidly identifying a list of candidates, starting from only a single image of a given prototype, without the need for any time-consuming neural network training. Our experiments show ULISSE is able to identify AGN candidates based on a combination of host galaxy morphology, color and the presence of a central nuclear source, with a retrieval efficiency ranging from 21% to 65% (including composite sources) depending on the prototype, where the random guess baseline is 12%. We find ULISSE to be most effective in retrieving AGN in early-type host galaxies, as opposed to prototypes with spiral- or late-type properties. Based on the results described in this work, ULISSE can be a promising tool for selecting different types of astrophysical objects in current and future wide-field surveys (e.g. Euclid, LSST etc.) that target millions of sources every single night.
2022-08-23 16:24:19 - QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results - *Raghav Mehta, Angelos Filos, Ujjwal Baid, Chiharu Sako, Richard McKinley, Michael Rebsamen, Katrin Datwyler, Raphael Meier, Piotr Radojewski, Gowtham Krishnan Murugesan, Sahil Nalawade, Chandan Ganesh, Ben Wagner, Fang F. Yu, Baowei Fei, Ananth J. Madhuranthakam, Joseph A. Maldjian, Laura Daza, Catalina Gomez, Pablo Arbelaez, Chengliang Dai, Shuo Wang, Hadrien Reynaud, Yuan-han Mo, Elsa Angelini, Yike Guo, Wenjia Bai, Subhashis Banerjee, Lin-min Pei, Murat AK, Sarahi Rosas-Gonzalez, Ilyess Zemmoura, Clovis Tauber, Minh H. Vu, Tufve Nyholm, Tommy Lofstedt, Laura Mora Ballestar, Veronica Vilaplana, Hugh McHugh, Gonzalo Maso Talou, Alan Wang, Jay Patel, Ken Chang, Katharina Hoebel, Mishka Gidwani, Nishanth Arun, Sharut Gupta, Mehak Aggarwal, Praveer Singh, Elizabeth R. Gerstner, Jayashree Kalpathy-Cramer, Nicolas Boutry, Alexis Huard, Lasitha Vidyaratne, Md Monibor Rahman, Khan M. Iftekharuddin, Joseph Chazalon, Elodie Puybareau, Guillaume Tochon, Jun Ma, Mariano Cabezas, Xavier Llado, Arnau Oliver, Liliana Valencia, Sergi Valverde, Mehdi Amian, Mohammadreza Soltaninejad, Andriy Myronenko, Ali Hatamizadeh, Xue Feng, Quan Dou, Nicholas Tustison, Craig Meyer, Nisarg A. Shah, Sanjay Talbar, Marc-Andre Weber, Abhishek Mahajan, Andras Jakab, Roland Wiest, Hassan M. Fathallah-Shaykh, Arash Nazeri, Mikhail Milchenko1, Daniel Marcus, Aikaterini Kotrotsou, Rivka Colen, John Freymann, Justin Kirby, Christos Davatzikos, Bjoern Menze, Spyridon Bakas, Yarin Gal, Tal Arbel* - `2112.10074v2` - [abs](http://arxiv.org/abs/2112.10074v2) - [pdf](http://arxiv.org/pdf/2112.10074v2) > Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Developing scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS.
2022-08-23 18:49:22 - Two Decades of Bengali Handwritten Digit Recognition: A Survey - *A. B. M. Ashikur Rahman, Md. Bakhtiar Hasan, Sabbir Ahmed, Tasnim Ahmed, Md. Hamjajul Ashmafee, Mohammad Ridwan Kabir, Md. Hasanul Kabir* - `2206.02234v2` - [abs](http://arxiv.org/abs/2206.02234v2) - [pdf](http://arxiv.org/pdf/2206.02234v2) > Handwritten Digit Recognition (HDR) is one of the most challenging tasks in the domain of Optical Character Recognition (OCR). Irrespective of language, there are some inherent challenges of HDR, which mostly arise due to the variations in writing styles across individuals, writing medium and environment, inability to maintain the same strokes while writing any digit repeatedly, etc. In addition to that, the structural complexities of the digits of a particular language may lead to ambiguous scenarios of HDR. Over the years, researchers have developed numerous offline and online HDR pipelines, where different image processing techniques are combined with traditional Machine Learning (ML)-based and/or Deep Learning (DL)-based architectures. Although evidence of extensive review studies on HDR exists in the literature for languages, such as: English, Arabic, Indian, Farsi, Chinese, etc., few surveys on Bengali HDR (BHDR) can be found, which lack a comprehensive analysis of the challenges, the underlying recognition process, and possible future directions. In this paper, the characteristics and inherent ambiguities of Bengali handwritten digits along with a comprehensive insight of two decades of the state-of-the-art datasets and approaches towards offline BHDR have been analyzed. Furthermore, several real-life application-specific studies, which involve BHDR, have also been discussed in detail. This paper will also serve as a compendium for researchers interested in the science behind offline BHDR, instigating the exploration of newer avenues of relevant research that may further lead to better offline recognition of Bengali handwritten digits in different application areas.
2022-08-23 20:46:49 - On Fitness Landscape Analysis of Permutation Problems: From Distance Metrics to Mutation Operator Selection - *Vincent A. Cicirello* - `2208.11188v1` - [abs](http://arxiv.org/abs/2208.11188v1) - [pdf](http://arxiv.org/pdf/2208.11188v1) > In this paper, we explore the theory and expand upon the practice of fitness landscape analysis for optimization problems over the space of permutations. Many of the computational and analytical tools for fitness landscape analysis, such as fitness distance correlation, require identifying a distance metric for measuring the similarity of different solutions to the problem. We begin with a survey of the available distance metrics for permutations, and then use principal component analysis to classify these metrics. The result of this analysis aligns with existing classifications of permutation problem types produced through less formal means, including the A-permutation, R-permutation, and P-permutation types, which classifies problems by whether absolute position of permutation elements, relative positions of elements, or general precedence of pairs of elements, is the dominant influence over solution fitness. Additionally, the formal analysis identifies subtypes within these problem categories. We see that the classification can assist in identifying appropriate metrics based on optimization problem feature for use in fitness landscape analysis. Using optimization problems of each class, we also demonstrate how the classification scheme can subsequently inform the choice of mutation operator within an evolutionary algorithm. From this, we present a classification of a variety of mutation operators as a counterpart to that of the metrics. Our implementations of the permutation metrics, permutation mutation operators, and associated evolutionary algorithm, are available in a pair of open source Java libraries. All of the code necessary to recreate our analysis and experimental results are also available as open source.
2022-08-24 02:01:06 - SCALE: Online Self-Supervised Lifelong Learning without Prior Knowledge - *Xiaofan Yu, Yunhui Guo, Sicun Gao, Tajana Rosing* - `2208.11266v1` - [abs](http://arxiv.org/abs/2208.11266v1) - [pdf](http://arxiv.org/pdf/2208.11266v1) > Unsupervised lifelong learning refers to the ability to learn over time while memorizing previous patterns without supervision. Previous works assumed strong prior knowledge about the incoming data (e.g., knowing the class boundaries) which can be impossible to obtain in complex and unpredictable environments. In this paper, motivated by real-world scenarios, we formally define the online unsupervised lifelong learning problem with class-incremental streaming data, which is non-iid and single-pass. The problem is more challenging than existing lifelong learning problems due to the absence of labels and prior knowledge. To address the issue, we propose Self-Supervised ContrAstive Lifelong LEarning (SCALE) which extracts and memorizes knowledge on-the-fly. SCALE is designed around three major components: a pseudo-supervised contrastive loss, a self-supervised forgetting loss, and an online memory update for uniform subset selection. All three components are designed to work collaboratively to maximize learning performance. Our loss functions leverage pairwise similarity thus remove the dependency on supervision or prior knowledge. We perform comprehensive experiments of SCALE under iid and four non-iid data streams. SCALE outperforms the best state-of-the-art algorithm on all settings with improvements of up to 6.43%, 5.23% and 5.86% kNN accuracy on CIFAR-10, CIFAR-100 and SubImageNet datasets.
2022-08-24 04:26:21 - Semi-Supervised and Unsupervised Deep Visual Learning: A Survey - *Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata* - `2208.11296v1` - [abs](http://arxiv.org/abs/2208.11296v1) - [pdf](http://arxiv.org/pdf/2208.11296v1) > State-of-the-art deep learning models are often trained with a large amount of costly labeled training data. However, requiring exhaustive manual annotations may degrade the model's generalizability in the limited-label regime. Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data. Recent progress in these paradigms has indicated the strong benefits of leveraging unlabeled data to improve model generalization and provide better model initialization. In this survey, we review the recent advanced deep learning algorithms on semi-supervised learning (SSL) and unsupervised learning (UL) for visual recognition from a unified perspective. To offer a holistic understanding of the state-of-the-art in these areas, we propose a unified taxonomy. We categorize existing representative SSL and UL with comprehensive and insightful analysis to highlight their design rationales in different learning scenarios and applications in different computer vision tasks. Lastly, we discuss the emerging trends and open challenges in SSL and UL to shed light on future critical research directions.
2022-08-24 04:28:50 - KE-QI: A Knowledge Enhanced Article Quality Identification Dataset - *Chunhui Ai, Derui Wang, Xu Yan, Yang Xu, Wenrui Xie, Ziqiang Cao* - `2206.07556v3` - [abs](http://arxiv.org/abs/2206.07556v3) - [pdf](http://arxiv.org/pdf/2206.07556v3) > With so many articles of varying qualities being produced every moment, it is a very urgent task to screen outstanding articles and commit them to social media. To our best knowledge, there is a lack of datasets and mature research works in identifying high-quality articles. Consequently, we conduct some surveys and finalize 7 objective indicators to annotate the quality of 10k articles. During annotation, we find that many characteristics of high-quality articles (e.g., background) rely more on extensive external knowledge than inner semantic information of articles. In response, we link extracted article entities to Baidu Encyclopedia, then propose Knowledge Enhanced article Quality Identification (KE-QI) dataset. To make better use of external knowledge, we propose a compound model which fuses the text and external knowledge information via a gate unit to classify the quality of an article. Our experimental results on KE-QI show that with initialization of our pre-trained Node2Vec model, our model achieves about 78\% $F_1$, outperforming other baselines.
2022-08-24 13:10:48 - Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer - *Fengji Zhang, Jin Liu, Yao Wan, Xiao Yu, Xiao Liu, Jacky Keung* - `2208.11523v1` - [abs](http://arxiv.org/abs/2208.11523v1) - [pdf](http://arxiv.org/pdf/2208.11523v1) > Stack Overflow is one of the most popular programming communities where developers can seek help for their encountered problems. Nevertheless, if inexperienced developers fail to describe their problems clearly, it is hard for them to attract sufficient attention and get the anticipated answers. We propose M$_3$NSCT5, a novel approach to automatically generate multiple post titles from the given code snippets. Developers may use the generated titles to find closely related posts and complete their problem descriptions. M$_3$NSCT5 employs the CodeT5 backbone, which is a pre-trained Transformer model having an excellent language understanding and generation ability. To alleviate the ambiguity issue that the same code snippets could be aligned with different titles under varying contexts, we propose the maximal marginal multiple nucleus sampling strategy to generate multiple high-quality and diverse title candidates at a time for the developers to choose from. We build a large-scale dataset with 890,000 question posts covering eight programming languages to validate the effectiveness of M$_3$NSCT5. The automatic evaluation results on the BLEU and ROUGE metrics demonstrate the superiority of M$_3$NSCT5 over six state-of-the-art baseline models. Moreover, a human evaluation with trustworthy results also demonstrates the great potential of our approach for real-world application.
2022-08-24 13:49:24 - Mapping Husserlian phenomenology onto active inference - *Mahault Albarracin, Riddhi J. Pitliya, Maxwell J. D. Ramstead, Jeffrey Yoshimi* - `2208.09058v2` - [abs](http://arxiv.org/abs/2208.09058v2) - [pdf](http://arxiv.org/pdf/2208.09058v2) > Phenomenology is the rigorous descriptive study of conscious experience. Recent attempts to formalize Husserlian phenomenology provide us with a mathematical model of perception as a function of prior knowledge and expectation. In this paper, we re-examine elements of Husserlian phenomenology through the lens of active inference. In doing so, we aim to advance the project of computational phenomenology, as recently outlined by proponents of active inference. We propose that key aspects of Husserl's descriptions of consciousness can be mapped onto aspects of the generative models associated with the active inference approach. We first briefly review active inference. We then discuss Husserl's phenomenology, with a focus on time consciousness. Finally, we present our mapping from Husserlian phenomenology to active inference.
2022-08-24 14:22:34 - Neural Network Normal Estimation and Bathymetry Reconstruction from Sidescan Sonar - *Yiping Xie, Nils Bore, John Folkesson* - `2206.07819v2` - [abs](http://arxiv.org/abs/2206.07819v2) - [pdf](http://arxiv.org/pdf/2206.07819v2) > Sidescan sonar intensity encodes information about the changes of surface normal of the seabed. However, other factors such as seabed geometry as well as its material composition also affect the return intensity. One can model these intensity changes in a forward direction from the surface normals from bathymetric map and physical properties to the measured intensity or alternatively one can use an inverse model which starts from the intensities and models the surface normals. Here we use an inverse model which leverages deep learning's ability to learn from data; a convolutional neural network is used to estimate the surface normal from the sidescan. Thus the internal properties of the seabed are only implicitly learned. Once this information is estimated, a bathymetric map can be reconstructed through an optimization framework that also includes altimeter readings to provide a sparse depth profile as a constraint. Implicit neural representation learning was recently proposed to represent the bathymetric map in such an optimization framework. In this article, we use a neural network to represent the map and optimize it under constraints of altimeter points and estimated surface normal from sidescan. By fusing multiple observations from different angles from several sidescan lines, the estimated results are improved through optimization. We demonstrate the efficiency and scalability of the approach by reconstructing a high-quality bathymetry using sidescan data from a large sidescan survey. We compare the proposed data-driven inverse model approach of modeling a sidescan with a forward Lambertian model. We assess the quality of each reconstruction by comparing it with data constructed from a multibeam sensor.
2022-08-24 16:42:59 - A Review of Knowledge Graph Completion - *Mohamad Zamini, Hassan Reza, Minou Rabiei* - `2208.11652v1` - [abs](http://arxiv.org/abs/2208.11652v1) - [pdf](http://arxiv.org/pdf/2208.11652v1) > Information extraction methods proved to be effective at triple extraction from structured or unstructured data. The organization of such triples in the form of (head entity, relation, tail entity) is called the construction of Knowledge Graphs (KGs). Most of the current knowledge graphs are incomplete. In order to use KGs in downstream tasks, it is desirable to predict missing links in KGs. Different approaches have been recently proposed for representation learning of KGs by embedding both entities and relations into a low-dimensional vector space aiming to predict unknown triples based on previously visited triples. According to how the triples will be treated independently or dependently, we divided the task of knowledge graph completion into conventional and graph neural network representation learning and we discuss them in more detail. In conventional approaches, each triple will be processed independently and in GNN-based approaches, triples also consider their local neighborhood. View Full-Text
2022-08-24 16:55:13 - Cryogenic Neuromorphic Hardware - *Md Mazharul Islam, Shamiul Alam, Md Shafayat Hossain, Kaushik Roy, Ahmedullah Aziz* - `2204.07503v2` - [abs](http://arxiv.org/abs/2204.07503v2) - [pdf](http://arxiv.org/pdf/2204.07503v2) > The revolution in artificial intelligence (AI) brings up an enormous storage and data processing requirement. Large power consumption and hardware overhead have become the main challenges for building next-generation AI hardware. To mitigate this, Neuromorphic computing has drawn immense attention due to its excellent capability for data processing with very low power consumption. While relentless research has been underway for years to minimize the power consumption in neuromorphic hardware, we are still a long way off from reaching the energy efficiency of the human brain. Furthermore, design complexity and process variation hinder the large-scale implementation of current neuromorphic platforms. Recently, the concept of implementing neuromorphic computing systems in cryogenic temperature has garnered intense interest thanks to their excellent speed and power metric. Several cryogenic devices can be engineered to work as neuromorphic primitives with ultra-low demand for power. Here we comprehensively review the cryogenic neuromorphic hardware. We classify the existing cryogenic neuromorphic hardware into several hierarchical categories and sketch a comparative analysis based on key performance metrics. Our analysis concisely describes the operation of the associated circuit topology and outlines the advantages and challenges encountered by the state-of-the-art technology platforms. Finally, we provide insights to circumvent these challenges for the future progression of research.
2022-08-24 22:45:36 - Dual Diffusion Implicit Bridges for Image-to-Image Translation - *Xuan Su, Jiaming Song, Chenlin Meng, Stefano Ermon* - `2203.08382v2` - [abs](http://arxiv.org/abs/2203.08382v2) - [pdf](http://arxiv.org/pdf/2203.08382v2) > Common image-to-image translation methods rely on joint training over data from both source and target domains. This prevents the training process from preserving privacy of domain data (e.g., in a federated setting), and often means that a new model has to be trained for a new pair of domains. We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models, that circumvents training on domain pairs. Image translation with DDIBs relies on two diffusion models trained independently on each domain, and is a two-step process: DDIBs first obtain latent encodings for source images with the source diffusion model, and then decode such encodings using the target model to construct target images. Both steps are defined via an ODE, thus the process is cycle consistent only up to discretization errors of the ODE solvers. Theoretically, we interpret DDIBs as concatenation of source to latent, and latent to target Schr\"odinger Bridges, a form of entropy-regularized optimal transport, to explain the efficacy of the method. Experimentally, we apply DDIBs on both synthetic and high-resolution image datasets, to demonstrate their utility in a wide variety of translation tasks and their connections to existing optimal transport methods.
2022-08-24 23:16:56 - A Hierarchical N-Gram Framework for Zero-Shot Link Prediction - *Mingchen Li, Junfan Chen, Samuel Mensah, Nikolaos Aletras, Xiulong Yang, Yang Ye* - `2204.10293v2` - [abs](http://arxiv.org/abs/2204.10293v2) - [pdf](http://arxiv.org/pdf/2204.10293v2) > Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embedding for each word token in the text. These methods lack robustness as they suffer from the out-of-vocabulary (OOV) problem. Meanwhile, models built on character n-grams have the capability of generating expressive representations for OOV words. Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP. Our approach works by first constructing a hierarchical n-gram graph on the surface name to model the organizational structure of n-grams that leads to the surface name. A GramTransformer, based on the Transformer is then presented to model the hierarchical n-gram graph to construct the relation embedding for ZSLP. Experimental results show the proposed HNZSLP achieved state-of-the-art performance on two ZSLP datasets.
2022-08-25 03:51:39 - Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey - *Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, Xia Hu* - `2208.11857v1` - [abs](http://arxiv.org/abs/2208.11857v1) - [pdf](http://arxiv.org/pdf/2208.11857v1) > Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly hurt their Out-of-Distribution (OOD) generalization and adversarial robustness. In this paper, we provide a review of recent developments that address the robustness challenge of LLMs. We first introduce the concepts and robustness challenge of LLMs. We then introduce methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we identify key challenges and introduce the connections of this line of research to other directions.
2022-08-25 08:47:27 - Rail break and derailment prediction using Probabilistic Graphical Modelling - *Rebecca M. C. Taylor, Johan A. du Preez* - `2208.11940v1` - [abs](http://arxiv.org/abs/2208.11940v1) - [pdf](http://arxiv.org/pdf/2208.11940v1) > Rail breaks are one of the most common causes of derailments internationally. This is no different for the South African Iron Ore line. Many rail breaks occur as a heavy-haul train passes over a crack, large defect or defective weld. In such cases, it is usually too late for the train to slow down in time to prevent a de-railment. Knowing the risk of a rail break occurring associated with a train passing over a section of rail allows for better implementation of maintenance initiatives and mitigating measures. In this paper the Ore Line's specific challenges are discussed and the currently available data that can be used to create a rail break risk prediction model is reviewed. The development of a basic rail break risk prediction model for the Ore Line is then presented. Finally the insight gained from the model is demonstrated by means of discussing various scenarios of various rail break risk. In future work, we are planning on extending this basic model to allow input from live monitoring systems such as the ultrasonic broken rail detection system.
2022-08-25 09:55:25 - Understanding Diffusion Models: A Unified Perspective - *Calvin Luo* - `2208.11970v1` - [abs](http://arxiv.org/abs/2208.11970v1) - [pdf](http://arxiv.org/pdf/2208.11970v1) > Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
2022-08-25 10:21:23 - On Reality and the Limits of Language Data - *Nigel H. Collier, Fangyu Liu, Ehsan Shareghi* - `2208.11981v1` - [abs](http://arxiv.org/abs/2208.11981v1) - [pdf](http://arxiv.org/pdf/2208.11981v1) > Recent advances in neural network language models have shown that it is possible to derive expressive meaning representations by leveraging linguistic associations in large-scale natural language data. These potentially Gestalt representations have enabled state-of-the-art performance for many practical applications. It would appear that we are on a pathway to empirically deriving a robust and expressive computable semantics. A key question that arises is how far can language data alone enable computers to understand the necessary truth about the physical world? Attention to this question is warranted because our future interactions with intelligent machines depends on how well our techniques correctly represent and process the concepts (objects, properties, and processes) that humans commonly observe to be true. After reviewing existing protocols, the objective of this work is to explore this question using a novel and tightly controlled reasoning test and to highlight what models might learn directly from pure linguistic data.
2022-08-25 12:07:28 - A review of ontologies for smart and continuous commissioning - *Sara Gilani, Caroline Quinn, J. J. McArthur* - `2205.07636v2` - [abs](http://arxiv.org/abs/2205.07636v2) - [pdf](http://arxiv.org/pdf/2205.07636v2) > Smart and continuous commissioning (SCCx) of buildings can result in a significant reduction in the gap between design and operational performance. Ontologies play an important role in SCCx as they facilitate data readability and reasoning by machines. A better understanding of ontologies is required in order to develop and incorporate them in SCCx. This paper critically reviews the state-of-the-art research on building data ontologies since 2014 within the SCCx domain through sorting them based on building data types, general approaches, and applications. The data types of two main domains of building information modeling and building management system have been considered in the majority of existing ontologies. Three main applications are evident from a critical analysis of existing ontologies: (1) key performance indicator calculation, (2) building performance improvement, and (3) fault detection and diagnosis. The key gaps found in the literature review are a holistic ontology for SCCx and insight on how such approaches should be evaluated. Based on these findings, this study provides recommendations for future necessary research including: identification of SCCx-related data types, assessment of ontology performance, and creation of open-source approaches.
2022-08-25 14:44:06 - Towards deep observation: A systematic survey on artificial intelligence techniques to monitor fetus via Ultrasound Images - *Mahmood Alzubaidi, Marco Agus, Khalid Alyafei, Khaled A Althelaya, Uzair Shah, Alaa Abd-Alrazaq, Mohammed Anbar, Michel Makhlouf, Mowafa Househ* - `2201.07935v2` - [abs](http://arxiv.org/abs/2201.07935v2) - [pdf](http://arxiv.org/pdf/2201.07935v2) > Developing innovative informatics approaches aimed to enhance fetal monitoring is a burgeoning field of study in reproductive medicine. Several reviews have been conducted regarding Artificial intelligence (AI) techniques to improve pregnancy outcomes. They are limited by focusing on specific data such as mother's care during pregnancy. This systematic survey aims to explore how artificial intelligence (AI) can assist with fetal growth monitoring via Ultrasound (US) image. We used eight medical and computer science bibliographic databases, including PubMed, Embase, PsycINFO, ScienceDirect, IEEE explore, ACM Library, Google Scholar, and the Web of Science. We retrieved studies published between 2010 to 2021. Data extracted from studies were synthesized using a narrative approach. Out of 1269 retrieved studies, we included 107 distinct studies from queries that were relevant to the topic in the survey. We found that 2D ultrasound images were more popular (n=88) than 3D and 4D ultrasound images (n=19). Classification is the most used method (n=42), followed by segmentation (n=31), classification integrated with segmentation (n=16) and other miscellaneous such as object-detection, regression and reinforcement learning (n=18). The most common areas within the pregnancy domain were the fetus head (n=43), then fetus body (n=31), fetus heart (n=13), fetus abdomen (n=10), and lastly the fetus face (n=10). In the most recent studies, deep learning techniques were primarily used (n=81), followed by machine learning (n=16), artificial neural network (n=7), and reinforcement learning (n=2). AI techniques played a crucial role in predicting fetal diseases and identifying fetus anatomy structures during pregnancy. More research is required to validate this technology from a physician's perspective, such as pilot studies and randomized controlled trials on AI and its applications in a hospital setting.
2022-08-25 14:44:53 - AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results - *Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, Jingzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota, Marco Buzzelli, Simone Bianco, Raimondo Schettini, Dafeng Zhang, Feiyu Huang, Shizhuo Liu, Xiaobing Wang, Zhezhu Jin, Bingchen Li, Xin Li, Mingxi Li, Ding Liu, Wenbin Zou, Peijie Dong, Tian Ye, Yunchen Zhang, Ming Tan, Xin Niu, Mustafa Ayazoglu, Marcos Conde, Ui-Jin Choi, Zhuang Jia, Tianyu Xu, Yijian Zhang, Mao Ye, Dengyan Luo, Xiaofeng Pan, Liuhan Peng* - `2208.11184v2` - [abs](http://arxiv.org/abs/2208.11184v2) - [pdf](http://arxiv.org/pdf/2208.11184v2) > This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.
2022-08-25 17:31:15 - Skin Lesion Analysis: A State-of-the-Art Survey, Systematic Review, and Future Trends - *Md. Kamrul Hasan, Md. Asif Ahamad, Choon Hwai Yap, Guang Yang* - `2208.12232v1` - [abs](http://arxiv.org/abs/2208.12232v1) - [pdf](http://arxiv.org/pdf/2208.12232v1) > The Computer-aided Diagnosis (CAD) system for skin lesion analysis is an emerging field of research that has the potential to relieve the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in developing such CAD systems, with the intention of providing a user-friendly tool to dermatologists in order to reduce the challenges that are raised by manual inspection. The purpose of this article is to provide a complete literature review of cutting-edge CAD techniques published between 2011 and 2020. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method was used to identify a total of 365 publications, 221 for skin lesion segmentation and 144 for skin lesion classification. These articles are analyzed and summarized in a number of different ways so that we can contribute vital information about the methods for the evolution of CAD systems. These ways include: relevant and essential definitions and theories, input data (datasets utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria (metrics). We also intend to investigate a variety of performance-enhancing methods, including ensemble and post-processing. In addition, in this survey, we highlight the primary problems associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these plights. In conclusion, enlightening findings, recommendations, and trends are discussed for the purpose of future research surveillance in related fields of interest. It is foreseen that it will guide researchers of all levels, from beginners to experts, in the process of developing an automated and robust CAD system for skin lesion analysis.
2022-08-25 20:04:11 - Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review - *Fadi AlMahamid, Katarina Grolinger* - `2208.12328v1` - [abs](http://arxiv.org/abs/2208.12328v1) - [pdf](http://arxiv.org/pdf/2208.12328v1) > There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones, in different applications such as packages delivery, traffic monitoring, search and rescue operations, and military combat engagements. In all of these applications, the UAV is used to navigate the environment autonomously - without human interaction, perform specific tasks and avoid obstacles. Autonomous UAV navigation is commonly accomplished using Reinforcement Learning (RL), where agents act as experts in a domain to navigate the environment while avoiding obstacles. Understanding the navigation environment and algorithmic limitations plays an essential role in choosing the appropriate RL algorithm to solve the navigation problem effectively. Consequently, this study first identifies the main UAV navigation tasks and discusses navigation frameworks and simulation software. Next, RL algorithms are classified and discussed based on the environment, algorithm characteristics, abilities, and applications in different UAV navigation problems, which will help the practitioners and researchers select the appropriate RL algorithms for their UAV navigation use cases. Moreover, identified gaps and opportunities will drive UAV navigation research.
2022-08-26 06:52:15 - Nuclei & Glands Instance Segmentation in Histology Images: A Narrative Review - *Esha Sadia Nasir, Arshi Perviaz, Muhammad Moazam Fraz* - `2208.12460v1` - [abs](http://arxiv.org/abs/2208.12460v1) - [pdf](http://arxiv.org/pdf/2208.12460v1) > Instance segmentation of nuclei and glands in the histology images is an important step in computational pathology workflow for cancer diagnosis, treatment planning and survival analysis. With the advent of modern hardware, the recent availability of large-scale quality public datasets and the community organized grand challenges have seen a surge in automated methods focusing on domain specific challenges, which is pivotal for technology advancements and clinical translation. In this survey, 126 papers illustrating the AI based methods for nuclei and glands instance segmentation published in the last five years (2017-2022) are deeply analyzed, the limitations of current approaches and the open challenges are discussed. Moreover, the potential future research direction is presented and the contribution of state-of-the-art methods is summarized. Further, a generalized summary of publicly available datasets and a detailed insights on the grand challenges illustrating the top performing methods specific to each challenge is also provided. Besides, we intended to give the reader current state of existing research and pointers to the future directions in developing methods that can be used in clinical practice enabling improved diagnosis, grading, prognosis, and treatment planning of cancer. To the best of our knowledge, no previous work has reviewed the instance segmentation in histology images focusing towards this direction.
2022-08-26 15:01:26 - A Comprehensive Review of Digital Twin -- Part 1: Modeling and Twinning Enabling Technologies - *Adam Thelen, Xiaoge Zhang, Olga Fink, Yan Lu, Sayan Ghosh, Byeng D. Youn, Michael D. Todd, Sankaran Mahadevan, Chao Hu, Zhen Hu* - `2208.14197v1` - [abs](http://arxiv.org/abs/2208.14197v1) - [pdf](http://arxiv.org/pdf/2208.14197v1) > As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This first paper presents a thorough literature review of digital twin trends across many disciplines currently pursuing this area of research. Then, digital twin modeling and twinning enabling technologies are further analyzed by classifying them into two main categories: physical-to-virtual, and virtual-to-physical, based on the direction in which data flows. Finally, this paper provides perspectives on the trajectory of digital twin technology over the next decade, and introduces a few emerging areas of research which will likely be of great use in future digital twin research. In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.
2022-08-26 15:09:18 - Federated and Privacy-Preserving Learning of Accounting Data in Financial Statement Audits - *Marco Schreyer, Timur Sattarov, Damian Borth* - `2208.12708v1` - [abs](http://arxiv.org/abs/2208.12708v1) - [pdf](http://arxiv.org/pdf/2208.12708v1) > The ongoing 'digital transformation' fundamentally changes audit evidence's nature, recording, and volume. Nowadays, the International Standards on Auditing (ISA) requires auditors to examine vast volumes of a financial statement's underlying digital accounting records. As a result, audit firms also 'digitize' their analytical capabilities and invest in Deep Learning (DL), a successful sub-discipline of Machine Learning. The application of DL offers the ability to learn specialized audit models from data of multiple clients, e.g., organizations operating in the same industry or jurisdiction. In general, regulations require auditors to adhere to strict data confidentiality measures. At the same time, recent intriguing discoveries showed that large-scale DL models are vulnerable to leaking sensitive training data information. Today, it often remains unclear how audit firms can apply DL models while complying with data protection regulations. In this work, we propose a Federated Learning framework to train DL models on auditing relevant accounting data of multiple clients. The framework encompasses Differential Privacy and Split Learning capabilities to mitigate data confidentiality risks at model inference. We evaluate our approach to detect accounting anomalies in three real-world datasets of city payments. Our results provide empirical evidence that auditors can benefit from DL models that accumulate knowledge from multiple sources of proprietary client data.
2022-08-26 15:11:10 - SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset - *Saihao Huang, Lijie Wang, Zhenghua Li, Zeyang Liu, Chenhui Dou, Fukang Yan, Xinyan Xiao, Hua Wu, Min Zhang* - `2208.12711v1` - [abs](http://arxiv.org/abs/2208.12711v1) - [pdf](http://arxiv.org/pdf/2208.12711v1) > As the first session-level Chinese dataset, CHASE contains two separate parts, i.e., 2,003 sessions manually constructed from scratch (CHASE-C), and 3,456 sessions translated from English SParC (CHASE-T). We find the two parts are highly discrepant and incompatible as training and evaluation data. In this work, we present SeSQL, yet another large-scale session-level text-to-SQL dataset in Chinese, consisting of 5,028 sessions all manually constructed from scratch. In order to guarantee data quality, we adopt an iterative annotation workflow to facilitate intense and in-time review of previous-round natural language (NL) questions and SQL queries. Moreover, by completing all context-dependent NL questions, we obtain 27,012 context-independent question/SQL pairs, allowing SeSQL to be used as the largest dataset for single-round multi-DB text-to-SQL parsing. We conduct benchmark session-level text-to-SQL parsing experiments on SeSQL by employing three competitive session-level parsers, and present detailed analysis.
2022-08-26 17:20:58 - Learning and Compositionality: a Unification Attempt via Connectionist Probabilistic Programming - *Ximing Qiao, Hai Li* - `2208.12789v1` - [abs](http://arxiv.org/abs/2208.12789v1) - [pdf](http://arxiv.org/pdf/2208.12789v1) > We consider learning and compositionality as the key mechanisms towards simulating human-like intelligence. While each mechanism is successfully achieved by neural networks and symbolic AIs, respectively, it is the combination of the two mechanisms that makes human-like intelligence possible. Despite the numerous attempts on building hybrid neuralsymbolic systems, we argue that our true goal should be unifying learning and compositionality, the core mechanisms, instead of neural and symbolic methods, the surface approaches to achieve them. In this work, we review and analyze the strengths and weaknesses of neural and symbolic methods by separating their forms and meanings (structures and semantics), and propose Connectionist Probabilistic Program (CPPs), a framework that connects connectionist structures (for learning) and probabilistic program semantics (for compositionality). Under the framework, we design a CPP extension for small scale sequence modeling and provide a learning algorithm based on Bayesian inference. Although challenges exist in learning complex patterns without supervision, our early results demonstrate CPP's successful extraction of concepts and relations from raw sequential data, an initial step towards compositional learning.
2022-08-26 19:45:51 - What Do NLP Researchers Believe? Results of the NLP Community Metasurvey - *Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman* - `2208.12852v1` - [abs](http://arxiv.org/abs/2208.12852v1) - [pdf](http://arxiv.org/pdf/2208.12852v1) > We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, whether language models understand language, and the necessity of linguistic structure and inductive bias for solving NLP problems. In addition, the survey posed meta-questions, asking respondents to predict the distribution of survey responses. This allows us not only to gain insight on the spectrum of beliefs held by NLP researchers, but also to uncover false sociological beliefs where the community's predictions don't match reality. We find such mismatches on a wide range of issues. Among other results, the community greatly overestimates its own belief in the usefulness of benchmarks and the potential for scaling to solve real-world problems, while underestimating its own belief in the importance of linguistic structure, inductive bias, and interdisciplinary science.
2022-08-26 22:19:50 - Multi-Modality Cardiac Image Computing: A Survey - *Lei Li, Wangbin Ding, Liqun Huang, Xiahai Zhuang, Vicente Grau* - `2208.12881v1` - [abs](http://arxiv.org/abs/2208.12881v1) - [pdf](http://arxiv.org/pdf/2208.12881v1) > Multi-modality cardiac imaging plays a key role in the management of patients with cardiovascular diseases. It allows a combination of complementary anatomical, morphological and functional information, increases diagnosis accuracy, and improves the efficacy of cardiovascular interventions and clinical outcomes. Fully-automated processing and quantitative analysis of multi-modality cardiac images could have a direct impact on clinical research and evidence-based patient management. However, these require overcoming significant challenges including inter-modality misalignment and finding optimal methods to integrate information from different modalities. This paper aims to provide a comprehensive review of multi-modality imaging in cardiology, the computing methods, the validation strategies, the related clinical workflows and future perspectives. For the computing methodologies, we have a favored focus on the three tasks, i.e., registration, fusion and segmentation, which generally involve multi-modality imaging data, \textit{either combining information from different modalities or transferring information across modalities}. The review highlights that multi-modality cardiac imaging data has the potential of wide applicability in the clinic, such as trans-aortic valve implantation guidance, myocardial viability assessment, and catheter ablation therapy and its patient selection. Nevertheless, many challenges remain unsolved, such as missing modality, combination of imaging and non-imaging data, and uniform analysis and representation of different modalities. There is also work to do in defining how the well-developed techniques fit in clinical workflows and how much additional and relevant information they introduce. These problems are likely to continue to be an active field of research and the questions to be answered in the future.
2022-08-27 00:32:29 - Comparison and Analysis of Image-to-Image Generative Adversarial Networks: A Survey - *Sagar Saxena, Mohammad Nayeem Teli* - `2112.12625v2` - [abs](http://arxiv.org/abs/2112.12625v2) - [pdf](http://arxiv.org/pdf/2112.12625v2) > Generative Adversarial Networks (GANs) have recently introduced effective methods of performing Image-to-Image translations. These models can be applied and generalized to a variety of domains in Image-to-Image translation without changing any parameters. In this paper, we survey and analyze eight Image-to-Image Generative Adversarial Networks: Pix2Pix, CycleGAN, CoGAN, StarGAN, MUNIT, StarGAN2, DA-GAN, and Self Attention GAN. Each of these models presented state-of-the-art results and introduced new techniques to build Image-to-Image GANs. In addition to a survey of the models, we also survey the 18 datasets they were trained on and the 9 metrics they were evaluated on. Finally, we present results of a controlled experiment for 6 of these models on a common set of metrics and datasets. The results were mixed and showed that, on certain datasets, tasks, and metrics, some models outperformed others. The last section of this paper discusses those results and establishes areas of future research. As researchers continue to innovate new Image-to-Image GANs, it is important to gain a good understanding of the existing methods, datasets, and metrics. This paper provides a comprehensive overview and discussion to help build this foundation.
2022-08-27 13:54:51 - Pervasive AI for IoT applications: A Survey on Resource-efficient Distributed Artificial Intelligence - *Emna Baccour, Naram Mhaisen, Alaa Awad Abdellatif, Aiman Erbad, Amr Mohamed, Mounir Hamdi, Mohsen Guizani* - `2105.01798v2` - [abs](http://arxiv.org/abs/2105.01798v2) - [pdf](http://arxiv.org/pdf/2105.01798v2) > Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services, spanning from recommendation systems to robotics control and military surveillance. This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams. Designing accurate models using such data streams, to predict future insights and revolutionize the decision-taking process, inaugurates pervasive systems as a worthy paradigm for a better quality-of-life. The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems from mainly data collection to executing distributed computations with a promising alternative to centralized learning, presenting various challenges. In this context, a wise cooperation and resource scheduling should be envisaged among IoT devices (e.g., smartphones, smart vehicles) and infrastructure (e.g. edge nodes, and base stations) to avoid communication and computation overheads and ensure maximum performance. In this paper, we conduct a comprehensive survey of the recent techniques developed to overcome these resource challenges in pervasive AI systems. Specifically, we first present an overview of the pervasive computing, its architecture, and its intersection with artificial intelligence. We then review the background, applications and performance metrics of AI, particularly Deep Learning (DL) and online learning, running in a ubiquitous system. Next, we provide a deep literature review of communication-efficient techniques, from both algorithmic and system perspectives, of distributed inference, training and online learning tasks across the combination of IoT devices, edge devices and cloud servers. Finally, we discuss our future vision and research challenges.
2022-08-27 14:44:18 - Multi-Outputs Is All You Need For Deblur - *Sidun Liu, Peng Qiao, Yong Dou* - `2208.13029v1` - [abs](http://arxiv.org/abs/2208.13029v1) - [pdf](http://arxiv.org/pdf/2208.13029v1) > Image deblurring task is an ill-posed one, where exists infinite feasible solutions for blurry image. Modern deep learning approaches usually discard the learning of blur kernels and directly employ end-to-end supervised learning. Popular deblurring datasets define the label as one of the feasible solutions. However, we argue that it's not reasonable to specify a label directly, especially when the label is sampled from a random distribution. Therefore, we propose to make the network learn the distribution of feasible solutions, and design based on this consideration a novel multi-head output architecture and corresponding loss function for distribution learning. Our approach enables the model to output multiple feasible solutions to approximate the target distribution. We further propose a novel parameter multiplexing method that reduces the number of parameters and computational effort while improving performance. We evaluated our approach on multiple image-deblur models, including the current state-of-the-art NAFNet. The improvement of best overall (pick the highest score among multiple heads for each validation image) PSNR outperforms the compared baselines up to 0.11~0.18dB. The improvement of the best single head (pick the best-performed head among multiple heads on validation set) PSNR outperforms the compared baselines up to 0.04~0.08dB. The codes are available at https://github.com/Liu-SD/multi-output-deblur.
2022-08-27 22:30:42 - Quantifying the Suicidal Tendency on Social Media: A Survey - *Muskan Garg* - `2110.03663v3` - [abs](http://arxiv.org/abs/2110.03663v3) - [pdf](http://arxiv.org/pdf/2110.03663v3) > Amid lockdown period more people express their feelings over social media platforms due to closed third-place and academic researchers have witnessed strong associations between the mental healthcare and social media posts. The stress for a brief period may lead to clinical depressions and the long-lasting traits of prevailing depressions can be life threatening with suicidal ideation as the possible outcome. The increasing concern towards the rise in number of suicide cases is because it is one of the leading cause of premature but preventable death. Recent studies have shown that mining social media data has helped in quantifying the suicidal tendency of users at risk. This potential manuscript elucidates the taxonomy of mental healthcare and highlights some recent attempts in examining the potential of quantifying suicidal tendency on social media data. This manuscript presents the classification of heterogeneous features from social media data and handling feature vector representation. Aiming to identify the new research directions and advances in the development of Machine Learning (ML) and Deep Learning (DL) based models, a quantitative synthesis and a qualitative review was carried out with corpus of over 77 potential research articles related to stress, depression and suicide risk from 2013 to 2021.
2022-08-28 02:43:47 - Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization - *Deyin Liu, Lin Wu, Lingqiao Liu, Haifeng Zhao, Farid Boussaid, Mohammed Bennamoun* - `2207.13036v3` - [abs](http://arxiv.org/abs/2207.13036v3) - [pdf](http://arxiv.org/pdf/2207.13036v3) > Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is often adopted to improve robustness through training a mixture of corrupted and clean data. However, most of AT based methods are ineffective in dealing with transferred adversarial examples which are generated to fool a wide spectrum of defense models, and thus cannot satisfy the generalization requirement raised in real-world scenarios. Moreover, adversarially training a defense model in general cannot produce interpretable predictions towards the inputs with perturbations, whilst a highly interpretable robust model is required by different domain experts to understand the behaviour of a DNN. In this work, we propose a novel approach based on Jacobian norm and Selective Input Gradient Regularization (J-SIGR), which suggests the linearized robustness through Jacobian normalization and also regularizes the perturbation-based saliency maps to imitate the model's interpretable predictions. As such, we achieve both the improved defense and high interpretability of DNNs. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
2022-08-28 06:19:32 - Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to prevent avoidable all-cause readmissions or death - *Joshua C. Chang, Ted L. Chang, Carson C. Chow, Rohit Mahajan, Sonya Mahajan, Shashaank Vattikuti, Hongjing Xia* - `2208.12814v1` - [abs](http://arxiv.org/abs/2208.12814v1) - [pdf](http://arxiv.org/pdf/2208.12814v1) > This manuscript addresses the simultaneous problems of predicting all-cause inpatient readmission or death after discharge, and quantifying the impact of discharge placement in preventing these adverse events. To this end, we developed an inherently interpretable multilevel Bayesian modeling framework inspired by the piecewise linearity of ReLU-activated deep neural networks. In a survival model, we explicitly adjust for confounding in quantifying local average treatment effects for discharge placement interventions. We trained the model on a 5% sample of Medicare beneficiaries from 2008 and 2011, and then tested the model on 2012 claims. Evaluated on classification accuracy for 30-day all-cause unplanned readmissions (defined using official CMS methodology) or death, the model performed similarly against XGBoost, logistic regression (after feature engineering), and a Bayesian deep neural network trained on the same data. Tested on the 30-day classification task of predicting readmissions or death using left-out future data, the model achieved an AUROC of approximately 0.76 and and AUPRC of approximately 0.50 (relative to an overall positively rate in the testing data of 18%), demonstrating how one need not sacrifice interpretability for accuracy. Additionally, the model had a testing AUROC of 0.78 on the classification of 90-day all-cause unplanned readmission or death. We easily peer into our inherently interpretable model, summarizing its main findings. Additionally, we demonstrate how the black-box posthoc explainer tool SHAP generates explanations that are not supported by the fitted model -- and if taken at face value does not offer enough context to make a model actionable.
2022-08-28 15:03:08 - Generic tool for numerical simulation of transformation-diffusion processes in complex volume geometric shapes: application to microbial decomposition of organic matter - *Monga Olivier, Hecht Frédéric, Moto Serge, Klai Mouad, Mbe Bruno, Dias Jorge, Garnier Patricia, Pot Valérie* - `2110.03130v3` - [abs](http://arxiv.org/abs/2110.03130v3) - [pdf](http://arxiv.org/pdf/2110.03130v3) > This paper presents a generic framework for the numerical simulation of transformation-diffusion processes in complex volume geometric shapes. This work follows a previous one devoted to the simulation of microbial degradation of organic matter in porous system at microscopic scale. We generalized and improved the MOSAIC method significantly and thus yielding a much more generic and efficient numerical simulation scheme. In particular, regarding the simulation of diffusion processes from the graph, in this study we proposed a completely explicit and semi-implicit numerical scheme that can significantly reduce the computational complexity. We validated our method by comparing the results to the one provided by classical Lattice Boltzmann Method (LBM) within the context of microbial decomposition simulation. For the same datasets, we obtained similar results in a significantly shorter computing time (i.e., 10-15 minutes) than the prior work (several hours). Besides the classical LBM method takes around 3 weeks computing time.
2022-08-28 16:33:48 - AutoQML: Automatic Generation and Training of Robust Quantum-Inspired Classifiers by Using Genetic Algorithms on Grayscale Images - *Sergio Altares-López, Juan José García-Ripoll, Angela Ribeiro* - `2208.13246v1` - [abs](http://arxiv.org/abs/2208.13246v1) - [pdf](http://arxiv.org/pdf/2208.13246v1) > We propose a new hybrid system for automatically generating and training quantum-inspired classifiers on grayscale images by using multiobjective genetic algorithms. We define a dynamic fitness function to obtain the smallest possible circuit and highest accuracy on unseen data, ensuring that the proposed technique is generalizable and robust. We minimize the complexity of the generated circuits in terms of the number of entanglement gates by penalizing their appearance. We reduce the size of the images with two dimensionality reduction approaches: principal component analysis (PCA), which is encoded in the individual for optimization purpose, and a small convolutional autoencoder (CAE). These two methods are compared with one another and with a classical nonlinear approach to understand their behaviors and to ensure that the classification ability is due to the quantum circuit and not the preprocessing technique used for dimensionality reduction.
2022-08-28 20:29:49 - Reinforcement Learning for Ridesharing: An Extended Survey - *Zhiwei Qin, Hongtu Zhu, Jieping Ye* - `2105.01099v7` - [abs](http://arxiv.org/abs/2105.01099v7) - [pdf](http://arxiv.org/pdf/2105.01099v7) > In this paper, we present a comprehensive, in-depth survey of the literature on reinforcement learning approaches to decision optimization problems in a typical ridesharing system. Papers on the topics of rideshare matching, vehicle repositioning, ride-pooling, routing, and dynamic pricing are covered. Most of the literature has appeared in the last few years, and several core challenges are to continue to be tackled: model complexity, agent coordination, and joint optimization of multiple levers. Hence, we also introduce popular data sets and open simulation environments to facilitate further research and development. Subsequently, we discuss a number of challenges and opportunities for reinforcement learning research on this important domain.
2022-08-29 00:25:23 - Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance - *Nicholas Kluge Corrêa, Camila Galvão, James William Santos, Carolina Del Pino, Edson Pontes Pinto, Camila Barbosa, Diogo Massmann, Rodrigo Mambrini, Luiza Galvão, Edmund Terem* - `2206.11922v3` - [abs](http://arxiv.org/abs/2206.11922v3) - [pdf](http://arxiv.org/pdf/2206.11922v3) > In the last decade, a great number of organizations have produced documents intended to standardize, in the normative sense, and promote guidance to our recent and rapid AI development. However, the full content and divergence of ideas presented in these documents have not yet been analyzed, except for a few meta-analyses and critical reviews of the field. In this work, we seek to expand on the work done by past researchers and create a tool for better data visualization of the contents and nature of these documents. We also provide our critical analysis of the results acquired by the application of our tool into a sample size of 200 documents.
2022-08-29 02:29:07 - Artificial Neural Networks for Finger Vein Recognition: A Survey - *Yimin Yin, Renye Zhang, Pengfei Liu, Wanxia Deng, Siliang He, Chen Li, Jinghua Zhang* - `2208.13341v1` - [abs](http://arxiv.org/abs/2208.13341v1) - [pdf](http://arxiv.org/pdf/2208.13341v1) > Finger vein recognition is an emerging biometric recognition technology. Different from the other biometric features on the body surface, the venous vascular tissue of the fingers is buried deep inside the skin. Due to this advantage, finger vein recognition is highly stable and private. They are almost impossible to be stolen and difficult to interfere with by external conditions. Unlike the finger vein recognition methods based on traditional machine learning, the artificial neural network technique, especially deep learning, it without relying on feature engineering and have superior performance. To summarize the development of finger vein recognition based on artificial neural networks, this paper collects 149 related papers. First, we introduce the background of finger vein recognition and the motivation of this survey. Then, the development history of artificial neural networks and the representative networks on finger vein recognition tasks are introduced. The public datasets that are widely used in finger vein recognition are then described. After that, we summarize the related finger vein recognition tasks based on classical neural networks and deep neural networks, respectively. Finally, the challenges and potential development directions in finger vein recognition are discussed. To our best knowledge, this paper is the first comprehensive survey focusing on finger vein recognition based on artificial neural networks.
2022-08-29 03:14:53 - Strategyproofing Peer Assessment via Partitioning: The Price in Terms of Evaluators' Expertise - *Komal Dhull, Steven Jecmen, Pravesh Kothari, Nihar B. Shah* - `2201.10631v3` - [abs](http://arxiv.org/abs/2201.10631v3) - [pdf](http://arxiv.org/pdf/2201.10631v3) > Strategic behavior is a fundamental problem in a variety of real-world applications that require some form of peer assessment, such as peer grading of homeworks, grant proposal review, conference peer review of scientific papers, and peer assessment of employees in organizations. Since an individual's own work is in competition with the submissions they are evaluating, they may provide dishonest evaluations to increase the relative standing of their own submission. This issue is typically addressed by partitioning the individuals and assigning them to evaluate the work of only those from different subsets. Although this method ensures strategyproofness, each submission may require a different type of expertise for effective evaluation. In this paper, we focus on finding an assignment of evaluators to submissions that maximizes assigned evaluators' expertise subject to the constraint of strategyproofness. We analyze the price of strategyproofness: that is, the amount of compromise on the assigned evaluators' expertise required in order to get strategyproofness. We establish several polynomial-time algorithms for strategyproof assignment along with assignment-quality guarantees. Finally, we evaluate the methods on a dataset from conference peer review.
2022-08-29 06:35:01 - Deep Depth Completion from Extremely Sparse Data: A Survey - *Junjie Hu, Chenyu Bao, Mete Ozay, Chenyou Fan, Qing Gao, Honghai Liu, Tin Lun Lam* - `2205.05335v3` - [abs](http://arxiv.org/abs/2205.05335v3) - [pdf](http://arxiv.org/pdf/2205.05335v3) > Depth completion aims at predicting dense pixel-wise depth from an extremely sparse map captured from a depth sensor, e.g., LiDARs. It plays an essential role in various applications such as autonomous driving, 3D reconstruction, augmented reality, and robot navigation. Recent successes on the task have been demonstrated and dominated by deep learning based solutions. In this article, for the first time, we provide a comprehensive literature review that helps readers better grasp the research trends and clearly understand the current advances. We investigate the related studies from the design aspects of network architectures, loss functions, benchmark datasets, and learning strategies with a proposal of a novel taxonomy that categorizes existing methods. Besides, we present a quantitative comparison of model performance on three widely used benchmarks, including indoor and outdoor datasets. Finally, we discuss the challenges of prior works and provide readers with some insights for future research directions.
2022-08-29 10:05:49 - Federated Zero-Shot Learning with Mid-Level Semantic Knowledge Transfer - *Shitong Sun, Chenyang Si, Shaogang Gong, Guile Wu* - `2208.13465v1` - [abs](http://arxiv.org/abs/2208.13465v1) - [pdf](http://arxiv.org/pdf/2208.13465v1) > Conventional centralised deep learning paradigms are not feasible when data from different sources cannot be shared due to data privacy or transmission limitation. To resolve this problem, federated learning has been introduced to transfer knowledge across multiple sources (clients) with non-shared data while optimising a globally generalised central model (server). Existing federated learning paradigms mostly focus on transferring holistic high-level knowledge (such as class) across models, which are closely related to specific objects of interest so may suffer from inverse attack. In contrast, in this work, we consider transferring mid-level semantic knowledge (such as attribute) which is not sensitive to specific objects of interest and therefore is more privacy-preserving and scalable. To this end, we formulate a new Federated Zero-Shot Learning (FZSL) paradigm to learn mid-level semantic knowledge at multiple local clients with non-shared local data and cumulatively aggregate a globally generalised central model for deployment. To improve model discriminative ability, we propose to explore semantic knowledge augmentation from external knowledge for enriching the mid-level semantic space in FZSL. Extensive experiments on five zeroshot learning benchmark datasets validate the effectiveness of our approach for optimising a generalisable federated learning model with mid-level semantic knowledge transfer.
2022-08-29 11:32:20 - A Deep Convolutional Neural Networks Based Multi-Task Ensemble Model for Aspect and Polarity Classification in Persian Reviews - *Milad Vazan, Fatemeh Sadat Masoumi, Sepideh Saeedi Majd* - `2201.06313v3` - [abs](http://arxiv.org/abs/2201.06313v3) - [pdf](http://arxiv.org/pdf/2201.06313v3) > Aspect-based sentiment analysis is of great importance and application because of its ability to identify all aspects discussed in the text. However, aspect-based sentiment analysis will be most effective when, in addition to identifying all the aspects discussed in the text, it can also identify their polarity. Most previous methods use the pipeline approach, that is, they first identify the aspects and then identify the polarities. Such methods are unsuitable for practical applications since they can lead to model errors. Therefore, in this study, we propose a multi-task learning model based on Convolutional Neural Networks (CNNs), which can simultaneously detect aspect category and detect aspect category polarity. creating a model alone may not provide the best predictions and lead to errors such as bias and high variance. To reduce these errors and improve the efficiency of model predictions, combining several models known as ensemble learning may provide better results. Therefore, the main purpose of this article is to create a model based on an ensemble of multi-task deep convolutional neural networks to enhance sentiment analysis in Persian reviews. We evaluated the proposed method using a Persian language dataset in the movie domain. Jacquard index and Hamming loss measures were used to evaluate the performance of the developed models. The results indicate that this new approach increases the efficiency of the sentiment analysis model in the Persian language.
2022-08-29 13:26:20 - Spatio-Temporal Wind Speed Forecasting using Graph Networks and Novel Transformer Architectures - *Lars Ødegaard Bentsen, Narada Dilp Warakagoda, Roy Stenbro, Paal Engelstad* - `2208.13585v1` - [abs](http://arxiv.org/abs/2208.13585v1) - [pdf](http://arxiv.org/pdf/2208.13585v1) > To improve the security and reliability of wind energy production, short-term forecasting has become of utmost importance. This study focuses on multi-step spatio-temporal wind speed forecasting for the Norwegian continental shelf. A graph neural network (GNN) architecture was used to extract spatial dependencies, with different update functions to learn temporal correlations. These update functions were implemented using different neural network architectures. One such architecture, the Transformer, has become increasingly popular for sequence modelling in recent years. Various alterations of the original architecture have been proposed to better facilitate time-series forecasting, of which this study focused on the Informer, LogSparse Transformer and Autoformer. This is the first time the LogSparse Transformer and Autoformer have been applied to wind forecasting and the first time any of these or the Informer have been formulated in a spatio-temporal setting for wind forecasting. By comparing against spatio-temporal Long Short-Term Memory (LSTM) and Multi-Layer Perceptron (MLP) models, the study showed that the models using the altered Transformer architectures as update functions in GNNs were able to outperform these. Furthermore, we propose the Fast Fourier Transformer (FFTransformer), which is a novel Transformer architecture based on signal decomposition and consists of two separate streams that analyse trend and periodic components separately. The FFTransformer and Autoformer were found to achieve superior results for the 10-minute and 1-hour ahead forecasts, with the FFTransformer significantly outperforming all other models for the 4-hour ahead forecasts. Finally, by varying the degree of connectivity for the graph representations, the study explicitly demonstrates how all models were able to leverage spatial dependencies to improve local short-term wind speed forecasting.
2022-08-29 14:24:13 - A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions - *Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, Fei Huang, Yongbin Li* - `2208.13629v1` - [abs](http://arxiv.org/abs/2208.13629v1) - [pdf](http://arxiv.org/pdf/2208.13629v1) > Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactions with the systems. In recent years, deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output SQL query. Subsequently, the large pre-trained language models have taken the state-of-the-art of the text-to-SQL parsing task to a new level. In this survey, we present a comprehensive review on deep learning approaches for text-to-SQL parsing. First, we introduce the text-to-SQL parsing corpora which can be categorized as single-turn and multi-turn. Second, we provide a systematical overview of pre-trained language models and existing methods for text-to-SQL parsing. Third, we present readers with the challenges faced by text-to-SQL parsing and explore some potential future directions in this field.
2022-08-29 14:51:57 - How to Teach: Learning Data-Free Knowledge Distillation from Curriculum - *Jingru Li, Sheng Zhou, Liangcheng Li, Xifeng Yan, Zhi Yu, Jiajun Bu* - `2208.13648v1` - [abs](http://arxiv.org/abs/2208.13648v1) - [pdf](http://arxiv.org/pdf/2208.13648v1) > Data-free knowledge distillation (DFKD) aims at training lightweight student networks from teacher networks without training data. Existing approaches mainly follow the paradigm of generating informative samples and progressively updating student models by targeting data priors, boundary samples or memory samples. However, it is difficult for the previous DFKD methods to dynamically adjust the generation strategy at different training stages, which in turn makes it difficult to achieve efficient and stable training. In this paper, we explore how to teach students the model from a curriculum learning (CL) perspective and propose a new approach, namely "CuDFKD", i.e., "Data-Free Knowledge Distillation with Curriculum". It gradually learns from easy samples to difficult samples, which is similar to the way humans learn. In addition, we provide a theoretical analysis of the majorization minimization (MM) algorithm and explain the convergence of CuDFKD. Experiments conducted on benchmark datasets show that with a simple course design strategy, CuDFKD achieves the best performance over state-of-the-art DFKD methods and different benchmarks, such as 95.28\% top1 accuracy of the ResNet18 model on CIFAR10, which is better than training from scratch with data. The training is fast, reaching the highest accuracy of 90\% within 30 epochs, and the variance during training is stable. Also in this paper, the applicability of CuDFKD is also analyzed and discussed.
2022-08-29 18:52:58 - Evolving Label Usage within Generation Z when Self-Describing Sexual Orientation - *Wilson Y. Lee, J. Nicholas Hobbs* - `2208.13833v1` - [abs](http://arxiv.org/abs/2208.13833v1) - [pdf](http://arxiv.org/pdf/2208.13833v1) > Evaluating change in ranked term importance in a growing corpus is a powerful tool for understanding changes in vocabulary usage. In this paper, we analyze a corpus of free-response answers where 33,993 LGBTQ Generation Z respondents from age 13 to 24 in the United States are asked to self-describe their sexual orientation. We observe that certain labels, such as bisexual, pansexual, and lesbian, remain equally important across age groups. The importance of other labels, such as homosexual, demisexual, and omnisexual, evolve across age groups. Although Generation Z is often stereotyped as homogenous, we observe noticeably different label usage when self-describing sexual orientation within it. We urge that interested parties must routinely survey the most important sexual orientation labels to their target audience and refresh their materials (such as demographic surveys) to reflect the constantly evolving LGBTQ community and create an inclusive environment.
2022-08-30 02:01:35 - Selection of a representative sorting model in a preference disaggregation setting: a review of existing procedures, new proposals, and experimental comparison - *Michał Wójcik, Miłosz Kadziński, Krzysztof Ciomek* - `2209.02410v1` - [abs](http://arxiv.org/abs/2209.02410v1) - [pdf](http://arxiv.org/pdf/2209.02410v1) > We consider preference disaggregation in the context of multiple criteria sorting. The value function parameters and thresholds separating the classes are inferred from the Decision Maker's (DM's) assignment examples. Given the multiplicity of sorting models compatible with indirect preferences, selecting a single, representative one can be conducted differently. We review several procedures for this purpose, aiming to identify the most discriminant, average, central, benevolent, aggressive, parsimonious, or robust models. Also, we present three novel procedures that implement the robust assignment rule in practice. They exploit stochastic acceptabilities and maximize the support given to the resulting assignments by all feasible sorting models. The performance of sixteen procedures is verified on problem instances with different complexities. The results of an experimental study indicate the most efficient procedure in terms of classification accuracy, reproducing the DM's model, and delivering the most robust assignments. These include approaches identifying differently interpreted centers of the feasible polyhedron and robust methods introduced in this paper. Moreover, we discuss how the performance of all procedures is affected by different numbers of classes, criteria, characteristic points, and reference assignments. Finally, we illustrate the use of all approaches in a study concerning the assessment of the green performance of European cities.
2022-08-30 04:34:32 - A Survey on Cross-Lingual Summarization - *Jiaan Wang, Fandong Meng, Duo Zheng, Yunlong Liang, Zhixu Li, Jianfeng Qu, Jie Zhou* - `2203.12515v2` - [abs](http://arxiv.org/abs/2203.12515v2) - [pdf](http://arxiv.org/pdf/2203.12515v2) > Cross-lingual summarization is the task of generating a summary in one language (e.g., English) for the given document(s) in a different language (e.g., Chinese). Under the globalization background, this task has attracted increasing attention of the computational linguistics community. Nevertheless, there still remains a lack of comprehensive review for this task. Therefore, we present the first systematic critical review on the datasets, approaches, and challenges in this field. Specifically, we carefully organize existing datasets and approaches according to different construction methods and solution paradigms, respectively. For each type of datasets or approaches, we thoroughly introduce and summarize previous efforts and further compare them with each other to provide deeper analyses. In the end, we also discuss promising directions and offer our thoughts to facilitate future research. This survey is for both beginners and experts in cross-lingual summarization, and we hope it will serve as a starting point as well as a source of new ideas for researchers and engineers interested in this area.
2022-08-30 17:45:38 - Correct-by-Construction Runtime Enforcement in AI -- A Survey - *Bettina Könighofer, Roderick Bloem, Rüdiger Ehlers, Christian Pek* - `2208.14426v1` - [abs](http://arxiv.org/abs/2208.14426v1) - [pdf](http://arxiv.org/pdf/2208.14426v1) > Runtime enforcement refers to the theories, techniques, and tools for enforcing correct behavior with respect to a formal specification of systems at runtime. In this paper, we are interested in techniques for constructing runtime enforcers for the concrete application domain of enforcing safety in AI. We discuss how safety is traditionally handled in the field of AI and how more formal guarantees on the safety of a self-learning agent can be given by integrating a runtime enforcer. We survey a selection of work on such enforcers, where we distinguish between approaches for discrete and continuous action spaces. The purpose of this paper is to foster a better understanding of advantages and limitations of different enforcement techniques, focusing on the specific challenges that arise due to their application in AI. Finally, we present some open challenges and avenues for future work.
2022-08-31 06:59:36 - Let Me Check the Examples: Enhancing Demonstration Learning via Explicit Imitation - *Sirui Wang, Kaiwen Wei, Hongzhi Zhang, Yuntao Li, Wei Wu* - `2209.00455v1` - [abs](http://arxiv.org/abs/2209.00455v1) - [pdf](http://arxiv.org/pdf/2209.00455v1) > Demonstration learning aims to guide the prompt prediction via providing answered demonstrations in the few shot settings. Despite achieving promising results, existing work only concatenates the answered examples as demonstrations to the prompt template (including the raw context) without any additional operation, neglecting the prompt-demonstration dependencies. Besides, prior research found that randomly replacing the labels of demonstrations marginally hurts performance, illustrating that the model could not properly learn the knowledge brought by the demonstrations. Inspired by the human learning process, in this paper, we introduce Imitation DEMOnstration Learning (Imitation-Demo) to strengthen demonstration learning via explicitly imitating human review behaviour, which includes: (1) contrastive learning mechanism to concentrate on the similar demonstrations. (2) demonstration-label re-prediction method to consolidate known knowledge. Experiment results show that our proposed method achieves state-of-the-art performance on 11 out of 14 classification corpora. Further studies also prove that Imitation-Demo strengthen the association between prompt and demonstrations, which could provide the basis for exploring how demonstration learning works.
2022-08-31 07:44:27 - NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results - *Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay, Fenglong Song, Wentao Chao, Qiang Guo, Yan Liu, Jiang Li, Xiaochao Qu, Dewang Hou, Jiayu Yang, Lyn Jiang, Di You, Zhenyu Zhang, Chong Mou, Iaroslav Koshelev, Pavel Ostyakov, Andrey Somov, Jia Hao, Xueyi Zou, Shijie Zhao, Xiaopeng Sun, Yiting Liao, Yuanzhi Zhang, Qing Wang, Gen Zhan, Mengxi Guo, Junlin Li, Ming Lu, Zhan Ma, Pablo Navarrete Michelini, Hai Wang, Yiyun Chen, Jingyu Guo, Liliang Zhang, Wenming Yang, Sijung Kim, Syehoon Oh, Yucong Wang, Minjie Cai, Wei Hao, Kangdi Shi, Liangyan Li, Jun Chen, Wei Gao, Wang Liu, Xiaoyu Zhang, Linjie Zhou, Sixin Lin, Ru Wang* - `2104.10781v6` - [abs](http://arxiv.org/abs/2104.10781v6) - [pdf](http://arxiv.org/pdf/2104.10781v6) > This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh
2022-08-31 13:58:00 - Automated Clinical Coding: What, Why, and Where We Are? - *Hang Dong, Matúš Falis, William Whiteley, Beatrice Alex, Joshua Matterson, Shaoxiong Ji, Jiaoyan Chen, Honghan Wu* - `2203.11092v2` - [abs](http://arxiv.org/abs/2203.11092v2) - [pdf](http://arxiv.org/pdf/2203.11092v2) > Clinical coding is the task of transforming medical information in a patient's health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy of the process. We introduce the idea of automated clinical coding and summarise its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019 - early 2022), and discussions with clinical coding experts in Scotland and the UK. Our research reveals the gaps between the current deep learning-based approach applied to clinical coding and the need for explainability and consistency in real-world practice. Knowledge-based methods that represent and reason the standard, explainable process of a task may need to be incorporated into deep learning-based methods for clinical coding. Automated clinical coding is a promising task for AI, despite the technical and organisational challenges. Coders are needed to be involved in the development process. There is much to achieve to develop and deploy an AI-based automated system to support coding in the next five years and beyond.
2022-08-31 15:52:02 - Cell-Free Latent Go-Explore - *Quentin Gallouédec, Emmanuel Dellandréa* - `2208.14928v1` - [abs](http://arxiv.org/abs/2208.14928v1) - [pdf](http://arxiv.org/pdf/2208.14928v1) > In this paper, we introduce Latent Go-Explore (LGE), a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL). Go-Explore was initially introduced with a strong domain knowledge constraint for partitioning the state space into cells. However, in most real-world scenarios, drawing domain knowledge from raw observations is complex and tedious. If the cell partitioning is not informative enough, Go-Explore can completely fail to explore the environment. We argue that the Go-Explore approach can be generalized to any environment without domain knowledge and without cells by exploiting a learned latent representation. Thus, we show that LGE can be flexibly combined with any strategy for learning a latent representation. We show that LGE, although simpler than Go-Explore, is more robust and outperforms all state-of-the-art algorithms in terms of pure exploration on multiple hard-exploration environments. The LGE implementation is available as open-source at https://github.com/qgallouedec/lge.
2022-08-31 17:47:58 - Scan-based immersed isogeometric flow analysis - *Clemens V. Verhoosel, E. Harald van Brummelen, Sai C. Divi, Frits de Prenter* - `2208.14994v1` - [abs](http://arxiv.org/abs/2208.14994v1) - [pdf](http://arxiv.org/pdf/2208.14994v1) > This chapter reviews the work conducted by our team on scan-based immersed isogeometric analysis for flow problems. To leverage the advantageous properties of isogeometric analysis on complex scan-based domains, various innovations have been made: (i) A spline-based segmentation strategy has been developed to extract a geometry suitable for immersed analysis directly from scan data; (ii) A stabilized equal-order velocity-pressure formulation for the Stokes problem has been proposed to attain stable results on immersed domains; (iii) An adaptive integration quadrature procedure has been developed to improve computational efficiency; (iv) A mesh refinement strategy has been developed to capture small features at a priori unknown locations, without drastically increasing the computational cost of the scan-based analysis workflow. We review the key ideas behind each of these innovations, and illustrate these using a selection of simulation results from our work. A patient-specific scan-based analysis case is reproduced to illustrate how these innovations enable the simulation of flow problems on complex scan data.
2022-08-31 20:32:35 - Efficient Methods for Natural Language Processing: A Survey - *Marcos Treviso, Tianchu Ji, Ji-Ung Lee, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Pedro H. Martins, André F. T. Martins, Peter Milder, Colin Raffel, Edwin Simpson, Noam Slonim, Niranjan Balasubramanian, Leon Derczynski, Roy Schwartz* - `2209.00099v1` - [abs](http://arxiv.org/abs/2209.00099v1) - [pdf](http://arxiv.org/pdf/2209.00099v1) > Getting the most out of limited resources allows advances in natural language processing (NLP) research and practice while being conservative with resources. Those resources may be data, time, storage, or energy. Recent work in NLP has yielded interesting results from scaling; however, using only scale to improve results means that resource consumption also scales. That relationship motivates research into efficient methods that require less resources to achieve similar results. This survey relates and synthesises methods and findings in those efficiencies in NLP, aiming to guide new researchers in the field and inspire the development of new methods.

2022-09

2022-09-01 02:48:56 - Satellite Image Based Cross-view Localization for Autonomous Vehicle - *Shan Wang, Yanhao Zhang, Hongdong Li* - `2207.13506v2` - [abs](http://arxiv.org/abs/2207.13506v2) - [pdf](http://arxiv.org/pdf/2207.13506v2) > Existing spatial localization techniques for autonomous vehicles mostly use a pre-built 3D-HD map, often constructed using a survey-grade 3D mapping vehicle, which is not only expensive but also laborious. This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy, providing a cheaper and more practical way for localization. Although the idea of using satellite images for cross-view localization is not new, previous methods almost exclusively treat the task as image retrieval, namely matching a vehicle-captured ground-view image with the satellite image. This paper presents a novel cross-view localization method, which departs from the common wisdom of image retrieval. Specifically, our method develops (1) a Geometric-align Feature Extractor (GaFE) that leverages measured 3D points to bridge the geometric gap between ground view and overhead view, (2) a Pose Aware Branch (PAB) adopting a triplet loss to encourage pose-aware feature extracting, and (3) a Recursive Pose Refine Branch (RPRB) using the Levenberg-Marquardt (LM) algorithm to align the initial pose towards the true vehicle pose iteratively. Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view. The results demonstrate the superiority of our method in cross-view localization with spatial and angular errors within 1 meter and $2^\circ$, respectively. The code will be made publicly available.
2022-09-01 04:11:48 - Review-Based Domain Disentanglement without Duplicate Users or Contexts for Cross-Domain Recommendation - *Yoonhyuk Choi, Jiho Choi, Taewook Ko, Hyungho Byun, Chong-Kwon Kim* - `2110.12648v2` - [abs](http://arxiv.org/abs/2110.12648v2) - [pdf](http://arxiv.org/pdf/2110.12648v2) > A cross-domain recommendation has shown promising results in solving data-sparsity and cold-start problems. Despite such progress, existing methods focus on domain-shareable information (overlapped users or same contexts) for a knowledge transfer, and they fail to generalize well without such requirements. To deal with these problems, we suggest utilizing review texts that are general to most e-commerce systems. Our model (named SER) uses three text analysis modules, guided by a single domain discriminator for disentangled representation learning. Here, we suggest a novel optimization strategy that can enhance the quality of domain disentanglement, and also debilitates detrimental information of a source domain. Also, we extend the encoding network from a single to multiple domains, which has proven to be powerful for review-based recommender systems. Extensive experiments and ablation studies demonstrate that our method is efficient, robust, and scalable compared to the state-of-the-art single and cross-domain recommendation methods.
2022-09-01 04:38:42 - Atomist or Holist? A Diagnosis and Vision for More Productive Interdisciplinary AI Ethics Dialogue - *Travis Greene, Amit Dhurandhar, Galit Shmueli* - `2208.09174v2` - [abs](http://arxiv.org/abs/2208.09174v2) - [pdf](http://arxiv.org/pdf/2208.09174v2) > In response to growing recognition of the social, legal, and ethical impacts of new AI-based technologies, major AI and ML conferences and journals now encourage or require submitted papers to include ethics impact statements and undergo ethics reviews. This move has sparked heated debate concerning the role of ethics in AI and data science research, at times devolving into counter-productive name-calling and threats of "cancellation." We diagnose this deep ideological conflict as one between atomists and holists. Among other things, atomists espouse the idea that facts are and should be kept separate from values, while holists believe facts and values are and should be inextricable from one another. With the goals of encouraging civil discourse across disciplines and reducing disciplinary polarization, we draw on a variety of historical sources ranging from philosophy and law, to social theory and humanistic psychology, to describe each ideology's beliefs and assumptions. Finally, we call on atomists and holists within the data science community to exhibit greater empathy during ethical disagreements and propose four targeted strategies to ensure data science research benefits society.
2022-09-01 13:35:29 - Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning - *Antonio Emanuele Cinà, Kathrin Grosse, Ambra Demontis, Sebastiano Vascon, Werner Zellinger, Bernhard A. Moser, Alina Oprea, Battista Biggio, Marcello Pelillo, Fabio Roli* - `2205.01992v2` - [abs](http://arxiv.org/abs/2205.01992v2) - [pdf](http://arxiv.org/pdf/2205.01992v2) > The success of machine learning is fueled by the increasing availability of computing power and large training datasets. The training data is used to learn new models or update existing ones, assuming that it is sufficiently representative of the data that will be encountered at test time. This assumption is challenged by the threat of poisoning, an attack that manipulates the training data to compromise the model's performance at test time. Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start by categorizing the current threat models and attacks, and then organize existing defenses accordingly. While we focus mostly on computer-vision applications, we argue that our systematization also encompasses state-of-the-art attacks and defenses for other data modalities. Finally, we discuss existing resources for research in poisoning, and shed light on the current limitations and open research questions in this research field.
2022-09-01 15:01:32 - The Neural Process Family: Survey, Applications and Perspectives - *Saurav Jha, Dong Gong, Xuesong Wang, Richard E. Turner, Lina Yao* - `2209.00517v1` - [abs](http://arxiv.org/abs/2209.00517v1) - [pdf](http://arxiv.org/pdf/2209.00517v1) > The standard approaches to neural network implementation yield powerful function approximation capabilities but are limited in their abilities to learn meta representations and reason probabilistic uncertainties in their predictions. Gaussian processes, on the other hand, adopt the Bayesian learning scheme to estimate such uncertainties but are constrained by their efficiency and approximation capacity. The Neural Processes Family (NPF) intends to offer the best of both worlds by leveraging neural networks for meta-learning predictive uncertainties. Such potential has brought substantial research activity to the family in recent years. Therefore, a comprehensive survey of NPF models is needed to organize and relate their motivation, methodology, and experiments. This paper intends to address this gap while digging deeper into the formulation, research themes, and applications concerning the family members. We shed light on their potential to bring several recent advances in other deep learning domains under one umbrella. We then provide a rigorous taxonomy of the family and empirically demonstrate their capabilities for modeling data generating functions operating on 1-d, 2-d, and 3-d input domains. We conclude by discussing our perspectives on the promising directions that can fuel the research advances in the field. Code for our experiments will be made available at https://github.com/srvCodes/neural-processes-survey.
2022-09-02 06:54:32 - Geometric and Learning-based Mesh Denoising: A Comprehensive Survey - *Honghua Chen, Mingqiang Wei, Jun Wang* - `2209.00841v1` - [abs](http://arxiv.org/abs/2209.00841v1) - [pdf](http://arxiv.org/pdf/2209.00841v1) > Mesh denoising is a fundamental problem in digital geometry processing. It seeks to remove surface noise, while preserving surface intrinsic signals as accurately as possible. While the traditional wisdom has been built upon specialized priors to smooth surfaces, learning-based approaches are making their debut with great success in generalization and automation. In this work, we provide a comprehensive review of the advances in mesh denoising, containing both traditional geometric approaches and recent learning-based methods. First, to familiarize readers with the denoising tasks, we summarize four common issues in mesh denoising. We then provide two categorizations of the existing denoising methods. Furthermore, three important categories, including optimization-, filter-, and data-driven-based techniques, are introduced and analyzed in detail, respectively. Both qualitative and quantitative comparisons are illustrated, to demonstrate the effectiveness of the state-of-the-art denoising methods. Finally, potential directions of future work are pointed out to solve the common problems of these approaches. A mesh denoising benchmark is also built in this work, and future researchers will easily and conveniently evaluate their methods with the state-of-the-art approaches.
2022-09-02 08:48:34 - Self-supervised Learning in Remote Sensing: A Review - *Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, Xiao Xiang Zhu* - `2206.13188v2` - [abs](http://arxiv.org/abs/2206.13188v2) - [pdf](http://arxiv.org/pdf/2206.13188v2) > In deep learning research, self-supervised learning (SSL) has received great attention triggering interest within both the computer vision and remote sensing communities. While there has been a big success in computer vision, most of the potential of SSL in the domain of earth observation remains locked. In this paper, we provide an introduction to, and a review of the concepts and latest developments in SSL for computer vision in the context of remote sensing. Further, we provide a preliminary benchmark of modern SSL algorithms on popular remote sensing datasets, verifying the potential of SSL in remote sensing and providing an extended study on data augmentations. Finally, we identify a list of promising directions of future research in SSL for earth observation (SSL4EO) to pave the way for fruitful interaction of both domains.
2022-09-02 10:46:40 - Goal-Conditioned Reinforcement Learning: Problems and Solutions - *Minghuan Liu, Menghui Zhu, Weinan Zhang* - `2201.08299v3` - [abs](http://arxiv.org/abs/2201.08299v3) - [pdf](http://arxiv.org/pdf/2201.08299v3) > Goal-conditioned reinforcement learning (GCRL), related to a set of complex RL problems, trains an agent to achieve different goals under particular scenarios. Compared to the standard RL solutions that learn a policy solely depending on the states or observations, GCRL additionally requires the agent to make decisions according to different goals. In this survey, we provide a comprehensive overview of the challenges and algorithms for GCRL. Firstly, we answer what the basic problems are studied in this field. Then, we explain how goals are represented and present how existing solutions are designed from different points of view. Finally, we make the conclusion and discuss potential future prospects that recent researches focus on.
2022-09-02 15:30:35 - Deep Learning for Face Anti-Spoofing: A Survey - *Zitong Yu, Yunxiao Qin, Xiaobai Li, Chenxu Zhao, Zhen Lei, Guoying Zhao* - `2106.14948v3` - [abs](http://arxiv.org/abs/2106.14948v3) - [pdf](http://arxiv.org/pdf/2106.14948v3) > Face anti-spoofing (FAS) has lately attracted increasing attention due to its vital role in securing face recognition systems from presentation attacks (PAs). As more and more realistic PAs with novel types spring up, traditional FAS methods based on handcrafted features become unreliable due to their limited representation capacity. With the emergence of large-scale academic datasets in the recent decade, deep learning based FAS achieves remarkable performance and dominates this area. However, existing reviews in this field mainly focus on the handcrafted features, which are outdated and uninspiring for the progress of FAS community. In this paper, to stimulate future research, we present the first comprehensive review of recent advances in deep learning based FAS. It covers several novel and insightful components: 1) besides supervision with binary label (e.g., '0' for bonafide vs. '1' for PAs), we also investigate recent methods with pixel-wise supervision (e.g., pseudo depth map); 2) in addition to traditional intra-dataset evaluation, we collect and analyze the latest methods specially designed for domain generalization and open-set FAS; and 3) besides commercial RGB camera, we summarize the deep learning applications under multi-modal (e.g., depth and infrared) or specialized (e.g., light field and flash) sensors. We conclude this survey by emphasizing current open issues and highlighting potential prospects.
2022-09-02 17:57:05 - Transformers in Remote Sensing: A Survey - *Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, Fahad Shahbaz khan* - `2209.01206v1` - [abs](http://arxiv.org/abs/2209.01206v1) - [pdf](http://arxiv.org/pdf/2209.01206v1) > Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformers-based architectures, originally introduced in natural language processing, have pervaded computer vision field where the self-attention mechanism has been utilized as a replacement to the popular convolution operator for capturing long-range dependencies. Inspired by recent advances in computer vision, remote sensing community has also witnessed an increased exploration of vision transformers for a diverse set of tasks. Although a number of surveys have focused on transformers in computer vision in general, to the best of our knowledge we are the first to present a systematic review of recent advances based on transformers in remote sensing. Our survey covers more than 60 recent transformers-based methods for different remote sensing problems in sub-areas of remote sensing: very high-resolution (VHR), hyperspectral (HSI) and synthetic aperture radar (SAR) imagery. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing. Additionally, we intend to frequently update and maintain the latest transformers in remote sensing papers with their respective code at: https://github.com/VIROBO-15/Transformer-in-Remote-Sensing
2022-09-03 09:02:31 - Text classification problems via BERT embedding method and graph convolutional neural network - *Loc Hoang Tran, Tuan Tran, An Mai* - `2111.15379v3` - [abs](http://arxiv.org/abs/2111.15379v3) - [pdf](http://arxiv.org/pdf/2111.15379v3) > This paper presents the novel way combining the BERT embedding method and the graph convolutional neural network. This combination is employed to solve the text classification problem. Initially, we apply the BERT embedding method to the texts (in the BBC news dataset and the IMDB movie reviews dataset) in order to transform all the texts to numerical vector. Then, the graph convolutional neural network will be applied to these numerical vectors to classify these texts into their ap-propriate classes/labels. Experiments show that the performance of the graph convolutional neural network model is better than the perfor-mances of the combination of the BERT embedding method with clas-sical machine learning models.
2022-09-03 11:41:42 - Deep learning automates bidimensional and volumetric tumor burden measurement from MRI in pre- and post-operative glioblastoma patients - *Jakub Nalepa, Krzysztof Kotowski, Bartosz Machura, Szymon Adamski, Oskar Bozek, Bartosz Eksner, Bartosz Kokoszka, Tomasz Pekala, Mateusz Radom, Marek Strzelczak, Lukasz Zarudzki, Agata Krason, Filippo Arcadu, Jean Tessier* - `2209.01402v1` - [abs](http://arxiv.org/abs/2209.01402v1) - [pdf](http://arxiv.org/pdf/2209.01402v1) > Tumor burden assessment by magnetic resonance imaging (MRI) is central to the evaluation of treatment response for glioblastoma. This assessment is complex to perform and associated with high variability due to the high heterogeneity and complexity of the disease. In this work, we tackle this issue and propose a deep learning pipeline for the fully automated end-to-end analysis of glioblastoma patients. Our approach simultaneously identifies tumor sub-regions, including the enhancing tumor, peritumoral edema and surgical cavity in the first step, and then calculates the volumetric and bidimensional measurements that follow the current Response Assessment in Neuro-Oncology (RANO) criteria. Also, we introduce a rigorous manual annotation process which was followed to delineate the tumor sub-regions by the human experts, and to capture their segmentation confidences that are later used while training the deep learning models. The results of our extensive experimental study performed over 760 pre-operative and 504 post-operative adult patients with glioma obtained from the public database (acquired at 19 sites in years 2021-2020) and from a clinical treatment trial (47 and 69 sites for pre-/post-operative patients, 2009-2011) and backed up with thorough quantitative, qualitative and statistical analysis revealed that our pipeline performs accurate segmentation of pre- and post-operative MRIs in a fraction of the manual delineation time (up to 20 times faster than humans). The bidimensional and volumetric measurements were in strong agreement with expert radiologists, and we showed that RANO measurements are not always sufficient to quantify tumor burden.
2022-09-03 14:25:39 - A comprehensive survey on recent deep learning-based methods applied to surgical data - *Mansoor Ali, Rafael Martinez Garcia Pena, Gilberto Ochoa Ruiz, Sharib Ali* - `2209.01435v1` - [abs](http://arxiv.org/abs/2209.01435v1) - [pdf](http://arxiv.org/pdf/2209.01435v1) > Minimally invasive surgery is highly operator dependant with lengthy procedural times causing fatigue and risk to patients. In order to mitigate these risks, real-time systems can help assist surgeons to navigate and track tools, by providing clear understanding of scene and avoid miscalculations during operation. While several efforts have been made in this direction, a lack of diverse datasets, as well as very dynamic scenes and its variability in each patient entails major hurdle in accomplishing robust systems. In this work, we present a systematic review of recent machine learning-based approaches including surgical tool localisation, segmentation, tracking and 3D scene perception. Furthermore, we present current gaps and directions of these invented methods and provide rational behind clinical integration of these approaches.
2022-09-03 17:50:27 - Machine learning for dynamically predicting the onset of renal replacement therapy in chronic kidney disease patients using claims data - *Daniel Lopez-Martinez, Christina Chen, Ming-Jun Chen* - `2209.01469v1` - [abs](http://arxiv.org/abs/2209.01469v1) - [pdf](http://arxiv.org/pdf/2209.01469v1) > Chronic kidney disease (CKD) represents a slowly progressive disorder that can eventually require renal replacement therapy (RRT) including dialysis or renal transplantation. Early identification of patients who will require RRT (as much as 1 year in advance) improves patient outcomes, for example by allowing higher-quality vascular access for dialysis. Therefore, early recognition of the need for RRT by care teams is key to successfully managing the disease. Unfortunately, there is currently no commonly used predictive tool for RRT initiation. In this work, we present a machine learning model that dynamically identifies CKD patients at risk of requiring RRT up to one year in advance using only claims data. To evaluate the model, we studied approximately 3 million Medicare beneficiaries for which we made over 8 million predictions. We showed that the model can identify at risk patients with over 90% sensitivity and specificity. Although additional work is required before this approach is ready for clinical use, this study provides a basis for a screening tool to identify patients at risk within a time window that enables early proactive interventions intended to improve RRT outcomes.
2022-09-04 09:27:57 - Rice Leaf Disease Classification and Detection Using YOLOv5 - *Md Ershadul Haque, Ashikur Rahman, Iftekhar Junaeid, Samiul Ul Hoque, Manoranjan Paul* - `2209.01579v1` - [abs](http://arxiv.org/abs/2209.01579v1) - [pdf](http://arxiv.org/pdf/2209.01579v1) > A staple food in more than a hundred nations worldwide is rice (Oryza sativa). The cultivation of rice is vital to global economic growth. However, the main issue facing the agricultural industry is rice leaf disease. The quality and quantity of the crops have declined, and this is the main cause. As farmers in any country do not have much knowledge about rice leaf disease, they cannot diagnose rice leaf disease properly. That's why they cannot take proper care of rice leaves. As a result, the production is decreasing. From literature survey, it has seen that YOLOv5 exhibit the better result compare to others deep learning method. As a result of the continual advancement of object detection technology, YOLO family algorithms, which have extraordinarily high precision and better speed have been used in various scene recognition tasks to build rice leaf disease monitoring systems. We have annotate 1500 collected data sets and propose a rice leaf disease classification and detection method based on YOLOv5 deep learning. We then trained and evaluated the YOLOv5 model. The simulation outcomes show improved object detection result for the augmented YOLOv5 network proposed in this article. The required levels of recognition precision, recall, mAP value, and F1 score are 90\%, 67\%, 76\%, and 81\% respectively are considered as performance metrics.
2022-09-04 12:48:30 - Generalization in Neural Networks: A Broad Survey - *Chris Rohlfs* - `2209.01610v1` - [abs](http://arxiv.org/abs/2209.01610v1) - [pdf](http://arxiv.org/pdf/2209.01610v1) > This paper reviews concepts, modeling approaches, and recent findings along a spectrum of different levels of abstraction of neural network models including generalization across (1) Samples, (2) Distributions, (3) Domains, (4) Tasks, (5) Modalities, and (6) Scopes. Results on (1) sample generalization show that, in the case of ImageNet, nearly all the recent improvements reduced training error while overfitting stayed flat; with nearly all the training error eliminated, future progress will require a focus on reducing overfitting. Perspectives from statistics highlight how (2) distribution generalization can be viewed alternately as a change in sample weights or a change in the input-output relationship. Transfer learning approaches to (3) domain generalization are summarized, as are recent advances and the wealth of domain adaptation benchmark datasets available. Recent breakthroughs surveyed in (4) task generalization include few-shot meta-learning approaches and the BERT NLP engine, and recent (5) modality generalization studies are discussed that integrate image and text data and that apply a biologically-inspired network across olfactory, visual, and auditory modalities. Recent (6) scope generalization results are reviewed that embed knowledge graphs into deep NLP approaches. Additionally, concepts from neuroscience are discussed on the modular architecture of brains and the steps by which dopamine-driven conditioning leads to abstract thinking.
2022-09-04 13:46:54 - Interactive Question Answering Systems: Literature Review - *Giovanni Maria Biancofiore, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Fedelucio Narducci* - `2209.01621v1` - [abs](http://arxiv.org/abs/2209.01621v1) - [pdf](http://arxiv.org/pdf/2209.01621v1) > Question answering systems are recognized as popular and frequently effective means of information seeking on the web. In such systems, information seekers can receive a concise response to their query by presenting their questions in natural language. Interactive question answering is a recently proposed and increasingly popular solution that resides at the intersection of question answering and dialogue systems. On the one hand, the user can ask questions in normal language and locate the actual response to her inquiry; on the other hand, the system can prolong the question-answering session into a dialogue if there are multiple probable replies, very few, or ambiguities in the initial request. By permitting the user to ask more questions, interactive question answering enables users to dynamically interact with the system and receive more precise results. This survey offers a detailed overview of the interactive question-answering methods that are prevalent in current literature. It begins by explaining the foundational principles of question-answering systems, hence defining new notations and taxonomies to combine all identified works inside a unified framework. The reviewed published work on interactive question-answering systems is then presented and examined in terms of its proposed methodology, evaluation approaches, and dataset/application domain. We also describe trends surrounding specific tasks and issues raised by the community, so shedding light on the future interests of scholars. Our work is further supported by a GitHub page with a synthesis of all the major topics covered in this literature study. https://sisinflab.github.io/interactive-question-answering-systems-survey/
2022-09-04 18:00:29 - A Review of Sparse Expert Models in Deep Learning - *William Fedus, Jeff Dean, Barret Zoph* - `2209.01667v1` - [abs](http://arxiv.org/abs/2209.01667v1) - [pdf](http://arxiv.org/pdf/2209.01667v1) > Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters. By doing so, the degree of sparsity decouples the parameter count from the compute per example allowing for extremely large, but efficient models. The resulting models have demonstrated significant improvements across diverse domains such as natural language processing, computer vision, and speech recognition. We review the concept of sparse expert models, provide a basic description of the common algorithms, contextualize the advances in the deep learning era, and conclude by highlighting areas for future work.
2022-09-05 02:13:57 - Imaging with Equivariant Deep Learning - *Dongdong Chen, Mike Davies, Matthias J. Ehrhardt, Carola-Bibiane Schönlieb, Ferdia Sherry, Julián Tachella* - `2209.01725v1` - [abs](http://arxiv.org/abs/2209.01725v1) - [pdf](http://arxiv.org/pdf/2209.01725v1) > From early image processing to modern computational imaging, successful models and algorithms have relied on a fundamental property of natural signals: symmetry. Here symmetry refers to the invariance property of signal sets to transformations such as translation, rotation or scaling. Symmetry can also be incorporated into deep neural networks in the form of equivariance, allowing for more data-efficient learning. While there has been important advances in the design of end-to-end equivariant networks for image classification in recent years, computational imaging introduces unique challenges for equivariant network solutions since we typically only observe the image through some noisy ill-conditioned forward operator that itself may not be equivariant. We review the emerging field of equivariant imaging and show how it can provide improved generalization and new imaging opportunities. Along the way we show the interplay between the acquisition physics and group actions and links to iterative reconstruction, blind compressed sensing and self-supervised learning.
2022-09-05 07:04:02 - A Comprehensive Review of Deep Learning-based Single Image Super-resolution - *Syed Muhammad Arsalan Bashir, Yi Wang, Mahrukh Khan, Yilong Niu* - `2102.09351v3` - [abs](http://arxiv.org/abs/2102.09351v3) - [pdf](http://arxiv.org/pdf/2102.09351v3) > Image super-resolution (SR) is one of the vital image processing methods that improve the resolution of an image in the field of computer vision. In the last two decades, significant progress has been made in the field of super-resolution, especially by utilizing deep learning methods. This survey is an effort to provide a detailed survey of recent progress in single-image super-resolution in the perspective of deep learning while also informing about the initial classical methods used for image super-resolution. The survey classifies the image SR methods into four categories, i.e., classical methods, supervised learning-based methods, unsupervised learning-based methods, and domain-specific SR methods. We also introduce the problem of SR to provide intuition about image quality metrics, available reference datasets, and SR challenges. Deep learning-based approaches of SR are evaluated using a reference dataset. Some of the reviewed state-of-the-art image SR methods include the enhanced deep SR network (EDSR), cycle-in-cycle GAN (CinCGAN), multiscale residual network (MSRN), meta residual dense network (Meta-RDN), recurrent back-projection network (RBPN), second-order attention network (SAN), SR feedback network (SRFBN) and the wavelet-based residual attention network (WRAN). Finally, this survey is concluded with future directions and trends in SR and open problems in SR to be addressed by the researchers.
2022-09-05 08:02:52 - The Face of Affective Disorders - *Christian S. Pilz, Benjamin Clemens, Inka C. Hiss, Christoph Weiss, Ulrich Canzler, Jarek Krajewski, Ute Habel, Steffen Leonhardt* - `2208.01369v3` - [abs](http://arxiv.org/abs/2208.01369v3) - [pdf](http://arxiv.org/pdf/2208.01369v3) > We study the statistical properties of facial behaviour altered by the regulation of brain arousal in the clinical domain of psychiatry. The underlying mechanism is linked to the empirical interpretation of the vigilance continuum as behavioral surrogate measurement for certain states of mind. Referring to the classical scalp-based obtrusive measurements, we name the presented method Opto-Electronic Encephalography (OEG) which solely relies on modern camera-based real-time signal processing and computer vision. Based upon a stochastic representation as coherence of the face dynamics, reflecting the hemifacial asymmetry in emotion expressions, we demonstrate an almost flawless distinction between patients and healthy controls as well as between the mental disorders depression and schizophrenia and the symptom severity. In contrast to the standard diagnostic process, which is time-consuming, subjective and does not incorporate neurobiological data such as real-time face dynamics, the objective stochastic modeling of the affective responsiveness only requires a few minutes of video-based facial recordings. We also highlight the potential of the methodology as a causal inference model in transdiagnostic analysis to predict the outcome of pharmacological treatment. All results are obtained on a clinical longitudinal data collection with an amount of 99 patients and 43 controls.
2022-09-05 08:19:26 - A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension - *Xanh Ho, Johannes Mario Meissner, Saku Sugawara, Akiko Aizawa* - `2209.01824v1` - [abs](http://arxiv.org/abs/2209.01824v1) - [pdf](http://arxiv.org/pdf/2209.01824v1) > The issue of shortcut learning is widely known in NLP and has been an important research focus in recent years. Unintended correlations in the data enable models to easily solve tasks that were meant to exhibit advanced language understanding and reasoning capabilities. In this survey paper, we focus on the field of machine reading comprehension (MRC), an important task for showcasing high-level language understanding that also suffers from a range of shortcuts. We summarize the available techniques for measuring and mitigating shortcuts and conclude with suggestions for further progress in shortcut research. Most importantly, we highlight two main concerns for shortcut mitigation in MRC: the lack of public challenge sets, a necessary component for effective and reusable evaluation, and the lack of certain mitigation techniques that are prominent in other areas.
2022-09-05 08:56:43 - Which structure of academic articles do referees pay more attention to?: perspective of peer review and full-text of academic articles - *Chenglei Qin, Chengzhi Zhang* - `2209.01841v1` - [abs](http://arxiv.org/abs/2209.01841v1) - [pdf](http://arxiv.org/pdf/2209.01841v1) > Purpose The purpose of this paper is to explore which structures of academic articles referees would pay more attention to, what specific content referees focus on, and whether the distribution of PRC is related to the citations. Design/methodology/approach Firstly, utilizing the feature words of section title and hierarchical attention network model (HAN) to identify the academic article structures. Secondly, analyzing the distribution of PRC in different structures according to the position information extracted by rules in PRC. Thirdly, analyzing the distribution of feature words of PRC extracted by the Chi-square test and TF-IDF in different structures. Finally, four correlation analysis methods are used to analyze whether the distribution of PRC in different structures is correlated to the citations. Findings The count of PRC distributed in Materials and Methods and Results section is significantly more than that in the structure of Introduction and Discussion, indicating that referees pay more attention to the Material and Methods and Results. The distribution of feature words of PRC in different structures is obviously different, which can reflect the content of referees' concern. There is no correlation between the distribution of PRC in different structures and the citations. Research limitations/implications Due to the differences in the way referees write peer review reports, the rules used to extract position information cannot cover all PRC. Originality/value The paper finds a pattern in the distribution of PRC in different academic article structures proving the long-term empirical understanding. It also provides insight into academic article writing: researchers should ensure the scientificity of methods and the reliability of results when writing academic article to obtain a high degree of recognition from referees.
2022-09-05 12:18:48 - Text-based automatic personality prediction: A bibliographic review - *Ali-Reza Feizi-Derakhshi, Mohammad-Reza Feizi-Derakhshi, Majid Ramezani, Narjes Nikzad-Khasmakhi, Meysam Asgari-Chenaghlu, Taymaz Akan, Mehrdad Ranjbar-Khadivi, Elnaz Zafarni-Moattar, Zoleikha Jahanbakhsh-Naghadeh* - `2110.01186v3` - [abs](http://arxiv.org/abs/2110.01186v3) - [pdf](http://arxiv.org/pdf/2110.01186v3) > Personality detection is an old topic in psychology and Automatic Personality Prediction (or Perception) (APP) is the automated (computationally) forecasting of the personality on different types of human generated/exchanged contents (such as text, speech, image, video). The principal objective of this study is to offer a shallow (overall) review of natural language processing approaches on APP since 2010. With the advent of deep learning and following it transfer-learning and pre-trained model in NLP, APP research area has been a hot topic, so in this review, methods are categorized into three; pre-trained independent, pre-trained model based, multimodal approaches. Also, to achieve a comprehensive comparison, reported results are informed by datasets.
2022-09-05 17:16:44 - "Dummy Grandpa, do you know anything?": Identifying and Characterizing Ad hominem Fallacy Usage in the Wild - *Utkarsh Patel, Animesh Mukherjee, Mainack Mondal* - `2209.02062v1` - [abs](http://arxiv.org/abs/2209.02062v1) - [pdf](http://arxiv.org/pdf/2209.02062v1) > Today, participating in discussions on online forums is extremely commonplace and these discussions have started rendering a strong influence on the overall opinion of online users. Naturally, twisting the flow of the argument can have a strong impact on the minds of naive users, which in the long run might have socio-political ramifications, for example, winning an election or spreading targeted misinformation. Thus, these platforms are potentially highly vulnerable to malicious players who might act individually or as a cohort to breed fallacious arguments with a motive to sway public opinion. Ad hominem arguments are one of the most effective forms of such fallacies. Although a simple fallacy, it is effective enough to sway public debates in offline world and can be used as a precursor to shutting down the voice of opposition by slander. In this work, we take a first step in shedding light on the usage of ad hominem fallacies in the wild. First, we build a powerful ad hominem detector with high accuracy (F1 more than 83%, showing a significant improvement over prior work), even for datasets for which annotated instances constitute a very small fraction. We then used our detector on 265k arguments collected from the online debate forum - CreateDebate. Our crowdsourced surveys validate our in-the-wild predictions on CreateDebate data (94% match with manual annotation). Our analysis revealed that a surprising 31.23% of CreateDebate content contains ad hominem fallacy, and a cohort of highly active users post significantly more ad hominem to suppress opposing views. Then, our temporal analysis revealed that ad hominem argument usage increased significantly since the 2016 US Presidential election, not only for topics like Politics, but also for Science and Law. We conclude by discussing important implications of our work to detect and defend against ad hominem fallacies.
2022-09-05 17:23:58 - Trust in Language Grounding: a new AI challenge for human-robot teams - *David M. Bossens, Christine Evers* - `2209.02066v1` - [abs](http://arxiv.org/abs/2209.02066v1) - [pdf](http://arxiv.org/pdf/2209.02066v1) > The challenge of language grounding is to fully understand natural language by grounding language in real-world referents. While AI techniques are available, the widespread adoption and effectiveness of such technologies for human-robot teams relies critically on user trust. This survey provides three contributions relating to the newly emerging field of trust in language grounding, including a) an overview of language grounding research in terms of AI technologies, data sets, and user interfaces; b) six hypothesised trust factors relevant to language grounding, which are tested empirically on a human-robot cleaning team; and c) future research directions for trust in language grounding.
2022-09-05 18:00:00 - The SZ flux-mass ($Y$-$M$) relation at low halo masses: improvements with symbolic regression and strong constraints on baryonic feedback - *Digvijay Wadekar, Leander Thiele, J. Colin Hill, Shivam Pandey, Francisco Villaescusa-Navarro, David N. Spergel, Miles Cranmer, Daisuke Nagai, Daniel Anglés-Alcázar, Shirley Ho, Lars Hernquist* - `2209.02075v1` - [abs](http://arxiv.org/abs/2209.02075v1) - [pdf](http://arxiv.org/pdf/2209.02075v1) > Ionized gas in the halo circumgalactic medium leaves an imprint on the cosmic microwave background via the thermal Sunyaev-Zeldovich (tSZ) effect. Feedback from active galactic nuclei (AGN) and supernovae can affect the measurements of the integrated tSZ flux of halos ($Y_\mathrm{SZ}$) and cause its relation with the halo mass ($Y_\mathrm{SZ}-M$) to deviate from the self-similar power-law prediction of the virial theorem. We perform a comprehensive study of such deviations using CAMELS, a suite of hydrodynamic simulations with extensive variations in feedback prescriptions. We use a combination of two machine learning tools (random forest and symbolic regression) to search for analogues of the $Y-M$ relation which are more robust to feedback processes for low masses ($M\lesssim 10^{14}\, h^{-1} \, M_\odot$); we find that simply replacing $Y\rightarrow Y(1+M_*/M_\mathrm{gas})$ in the relation makes it remarkably self-similar. This could serve as a robust multiwavelength mass proxy for low-mass clusters and galaxy groups. Our methodology can also be generally useful to improve the domain of validity of other astrophysical scaling relations. We also forecast that measurements of the $Y-M$ relation could provide percent-level constraints on certain combinations of feedback parameters and/or rule out a major part of the parameter space of supernova and AGN feedback models used in current state-of-the-art hydrodynamic simulations. Our results can be useful for using upcoming SZ surveys (e.g. SO, CMB-S4) and galaxy surveys (e.g. DESI and Rubin) to constrain the nature of baryonic feedback. Finally, we find that the an alternative relation, $Y-M_*$, provides complementary information on feedback than $Y-M$.
2022-09-05 18:35:00 - Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks - *Tilman Räuker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell* - `2207.13243v3` - [abs](http://arxiv.org/abs/2207.13243v3) - [pdf](http://arxiv.org/pdf/2207.13243v3) > The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed in the real world. However, they are generally difficult to analyze, raising concerns about using them without a rigorous understanding of how they function. Effective tools for interpreting them will be important for building more trustworthy AI by helping to identify failures, fix bugs, and improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining the internal components of DNNs, are well-suited for developing a mechanistic understanding, guiding manual modifications, and reverse engineering solutions. Much recent work has focused on DNN interpretability, and rapid progress has thus far made a thorough systematization of methods difficult. In this survey, we review over 300 works with a focus on inner interpretability tools. We introduce a taxonomy that classifies methods by what part of the network they help to explain (weights, neurons, subnetworks, or latent representations) and whether they are implemented during (intrinsic) or after (post hoc) training. To our knowledge, we are also the first to survey a number of connections between interpretability research and work in adversarial robustness, continual learning, modularity, network compression, and studying the human visual system. Finally, we discuss key challenges and argue for future work emphasizing diagnostics, benchmarking, and robustness.
2022-09-06 02:48:15 - A Multitask Deep Learning Model for Parsing Bridge Elements and Segmenting Defect in Bridge Inspection Images - *Chenyu Zhang, Muhammad Monjurul Karim, Ruwen Qin* - `2209.02190v1` - [abs](http://arxiv.org/abs/2209.02190v1) - [pdf](http://arxiv.org/pdf/2209.02190v1) > The vast network of bridges in the United States raises a high requirement for its maintenance and rehabilitation. The massive cost of manual visual inspection to assess the conditions of the bridges turns out to be a burden to some extent. Advanced robots have been leveraged to automate inspection data collection. Automating the segmentations of multiclass elements, as well as surface defects on the elements, in the large volume of inspection image data would facilitate an efficient and effective assessment of the bridge condition. Training separate single-task networks for element parsing (i.e., semantic segmentation of multiclass elements) and defect segmentation fails to incorporate the close connection between these two tasks in the inspection images where both recognizable structural elements and apparent surface defects are present. This paper is motivated to develop a multitask deep neural network that fully utilizes such interdependence between bridge elements and defects to boost the performance and generalization of the model. Furthermore, the effectiveness of the proposed network designs in improving the task performance was investigated, including feature decomposition, cross-talk sharing, and multi-objective loss function. A dataset with pixel-level labels of bridge elements and corrosion was developed for training and assessment of the models. Quantitative and qualitative results from evaluating the developed multitask deep neural network demonstrate that the recommended network outperforms the independent single-task networks not only in performance (2.59% higher mIoU on bridge parsing and 1.65% on corrosion segmentation) but also in computational time and implementation capability.
2022-09-06 07:39:12 - Medical image analysis based on transformer: A Review - *Zhaoshan Liu, Qiujie Lv, Chau Hung Lee, Lei Shen* - `2208.06643v2` - [abs](http://arxiv.org/abs/2208.06643v2) - [pdf](http://arxiv.org/pdf/2208.06643v2) > The transformer has dominated the natural language processing (NLP) field for a long time. Recently, the transformer-based method has been adopted into the computer vision (CV) field and shows promising results. As an important branch of the CV field, medical image analysis joins the wave of the transformer-based method rightfully. In this review, we illustrate the principle of the attention mechanism, and the detailed structures of the transformer, and depict how the transformer is adopted into medical image analysis. We organize the transformer-based medical image analysis applications in a sequence of different tasks, including classification, segmentation, synthesis, registration, localization, detection, captioning, and denoising. For the mainstream classification and segmentation tasks, we further divided the corresponding works based on different medical imaging modalities. The datasets corresponding to the related works are also organized. We include thirteen modalities and more than twenty objects in our work.
2022-09-06 08:02:55 - Zero-shot Aspect-level Sentiment Classification via Explicit Utilization of Aspect-to-Document Sentiment Composition - *Pengfei Deng, Jianhua Yuan, Yanyan Zhao, Bing Qin* - `2209.02276v1` - [abs](http://arxiv.org/abs/2209.02276v1) - [pdf](http://arxiv.org/pdf/2209.02276v1) > As aspect-level sentiment labels are expensive and labor-intensive to acquire, zero-shot aspect-level sentiment classification is proposed to learn classifiers applicable to new domains without using any annotated aspect-level data. In contrast, document-level sentiment data with ratings are more easily accessible. In this work, we achieve zero-shot aspect-level sentiment classification by only using document-level reviews. Our key intuition is that the sentiment representation of a document is composed of the sentiment representations of all the aspects of that document. Based on this, we propose the AF-DSC method to explicitly model such sentiment composition in reviews. AF-DSC first learns sentiment representations for all potential aspects and then aggregates aspect-level sentiments into a document-level one to perform document-level sentiment classification. In this way, we obtain the aspect-level sentiment classifier as the by-product of the document-level sentiment classifier. Experimental results on aspect-level sentiment classification benchmarks demonstrate the effectiveness of explicit utilization of sentiment composition in document-level sentiment classification. Our model with only 30k training data outperforms previous work utilizing millions of data.
2022-09-06 08:10:48 - Automated Defect Recognition of Castings defects using Neural Networks - *Alberto García-Pérez, María José Gómez-Silva, Arturo de la Escalera* - `2209.02279v1` - [abs](http://arxiv.org/abs/2209.02279v1) - [pdf](http://arxiv.org/pdf/2209.02279v1) > Industrial X-ray analysis is common in aerospace, automotive or nuclear industries where structural integrity of some parts needs to be guaranteed. However, the interpretation of radiographic images is sometimes difficult and may lead to two experts disagree on defect classification. The Automated Defect Recognition (ADR) system presented herein will reduce the analysis time and will also help reducing the subjective interpretation of the defects while increasing the reliability of the human inspector. Our Convolutional Neural Network (CNN) model achieves 94.2\% accuracy (mAP@IoU=50\%), which is considered as similar to expected human performance, when applied to an automotive aluminium castings dataset (GDXray), exceeding current state of the art for this dataset. On an industrial environment, its inference time is less than 400 ms per DICOM image, so it can be installed on production facilities with no impact on delivery time. In addition, an ablation study of the main hyper-parameters to optimise model accuracy from the initial baseline result of 75\% mAP up to 94.2\% mAP, was also conducted.
2022-09-06 13:05:29 - The HoloLens in Medicine: A systematic Review and Taxonomy - *Christina Gsaxner, Jianning Li, Antonio Pepe, Yuan Jin, Jens Kleesiek, Dieter Schmalstieg, Jan Egger* - `2209.03245v1` - [abs](http://arxiv.org/abs/2209.03245v1) - [pdf](http://arxiv.org/pdf/2209.03245v1) > The HoloLens (Microsoft Corp., Redmond, WA), a head-worn, optically see-through augmented reality display, is the main player in the recent boost in medical augmented reality research. In medical settings, the HoloLens enables the physician to obtain immediate insight into patient information, directly overlaid with their view of the clinical scenario, the medical student to gain a better understanding of complex anatomies or procedures, and even the patient to execute therapeutic tasks with improved, immersive guidance. In this systematic review, we provide a comprehensive overview of the usage of the first-generation HoloLens within the medical domain, from its release in March 2016, until the year of 2021, were attention is shifting towards it's successor, the HoloLens 2. We identified 171 relevant publications through a systematic search of the PubMed and Scopus databases. We analyze these publications in regard to their intended use case, technical methodology for registration and tracking, data sources, visualization as well as validation and evaluation. We find that, although the feasibility of using the HoloLens in various medical scenarios has been shown, increased efforts in the areas of precision, reliability, usability, workflow and perception are necessary to establish AR in clinical practice.
2022-09-06 15:01:06 - Explaining Machine Learning Models in Natural Conversations: Towards a Conversational XAI Agent - *Van Bach Nguyen, Jörg Schlötterer, Christin Seifert* - `2209.02552v1` - [abs](http://arxiv.org/abs/2209.02552v1) - [pdf](http://arxiv.org/pdf/2209.02552v1) > The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard design for the agent comprising natural language understanding and generation components. We build upon an XAI question bank which we extend by quality-controlled paraphrases to understand the user's information needs. We further systematically survey the literature for suitable explanation methods that provide the information to answer those questions, and present a comprehensive list of suggestions. Our work is the first step towards truly natural conversations about machine learning models with an explanation agent. The comprehensive list of XAI questions and the corresponding explanation methods may support other researchers in providing the necessary information to address users' demands.
2022-09-06 16:02:38 - Automatic counting of mounds on UAV images: combining instance segmentation and patch-level correction - *Majid Nikougoftar Nategh, Ahmed Zgaren, Wassim Bouachir, Nizar Bouguila* - `2209.02608v1` - [abs](http://arxiv.org/abs/2209.02608v1) - [pdf](http://arxiv.org/pdf/2209.02608v1) > Site preparation by mounding is a commonly used silvicultural treatment that improves tree growth conditions by mechanically creating planting microsites called mounds. Following site preparation, the next critical step is to count the number of mounds, which provides forest managers with a precise estimate of the number of seedlings required for a given plantation block. Counting the number of mounds is generally conducted through manual field surveys by forestry workers, which is costly and prone to errors, especially for large areas. To address this issue, we present a novel framework exploiting advances in Unmanned Aerial Vehicle (UAV) imaging and computer vision to accurately estimate the number of mounds on a planting block. The proposed framework comprises two main components. First, we exploit a visual recognition method based on a deep learning algorithm for multiple object detection by pixel-based segmentation. This enables a preliminary count of visible mounds, as well as other frequently seen objects (e.g. trees, debris, accumulation of water), to be used to characterize the planting block. Second, since visual recognition could limited by several perturbation factors (e.g. mound erosion, occlusion), we employ a machine learning estimation function that predicts the final number of mounds based on the local block properties extracted in the first stage. We evaluate the proposed framework on a new UAV dataset representing numerous planting blocks with varying features. The proposed method outperformed manual counting methods in terms of relative counting precision, indicating that it has the potential to be advantageous and efficient in difficult situations.
2022-09-06 18:05:35 - Handcrafted Feature Selection Techniques for Pattern Recognition: A Survey - *Alysson Ribeiro da Silva, Camila Guedes Silveira* - `2209.02746v1` - [abs](http://arxiv.org/abs/2209.02746v1) - [pdf](http://arxiv.org/pdf/2209.02746v1) > The accuracy of a classifier, when performing Pattern recognition, is mostly tied to the quality and representativeness of the input feature vector. Feature Selection is a process that allows for representing information properly and may increase the accuracy of a classifier. This process is responsible for finding the best possible features, thus allowing us to identify to which class a pattern belongs. Feature selection methods can be categorized as Filters, Wrappers, and Embed. This paper presents a survey on some Filters and Wrapper methods for handcrafted feature selection. Some discussions, with regard to the data structure, processing time, and ability to well represent a feature vector, are also provided in order to explicitly show how appropriate some methods are in order to perform feature selection. Therefore, the presented feature selection methods can be accurate and efficient if applied considering their positives and negatives, finding which one fits best the problem's domain may be the hardest task.
2022-09-06 18:44:45 - A Comprehensive Survey on Radio Frequency (RF) Fingerprinting: Traditional Approaches, Deep Learning, and Open Challenges - *Anu Jagannath, Jithin Jagannath, Prem Sagar Pattanshetty Vasanth Kumar* - `2201.00680v3` - [abs](http://arxiv.org/abs/2201.00680v3) - [pdf](http://arxiv.org/pdf/2201.00680v3) > Fifth generation (5G) network and beyond envision massive Internet of Things (IoT) rollout to support disruptive applications such as extended reality (XR), augmented/virtual reality (AR/VR), industrial automation, autonomous driving, and smart everything which brings together massive and diverse IoT devices occupying the radio frequency (RF) spectrum. Along with the spectrum crunch and throughput challenges, such a massive scale of wireless devices exposes unprecedented threat surfaces. RF fingerprinting is heralded as a candidate technology that can be combined with cryptographic and zero-trust security measures to ensure data privacy, confidentiality, and integrity in wireless networks. Motivated by the relevance of this subject in the future communication networks, in this work, we present a comprehensive survey of RF fingerprinting approaches ranging from a traditional view to the most recent deep learning (DL)-based algorithms. Existing surveys have mostly focused on a constrained presentation of the wireless fingerprinting approaches, however, many aspects remain untold. In this work, however, we mitigate this by addressing every aspect - background on signal intelligence (SIGINT), applications, relevant DL algorithms, systematic literature review of RF fingerprinting techniques spanning the past two decades, discussion on datasets, and potential research avenues - necessary to elucidate this topic to the reader in an encyclopedic manner.
2022-09-06 20:32:24 - Use and Misuse of Machine Learning in Anthropology - *Jeff Calder, Reed Coil, Annie Melton, Peter J. Olver, Gilbert Tostevin, Katrina Yezzi-Woodley* - `2209.02811v1` - [abs](http://arxiv.org/abs/2209.02811v1) - [pdf](http://arxiv.org/pdf/2209.02811v1) > Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.
2022-09-07 01:17:47 - Survey of Aspect-based Sentiment Analysis Datasets - *Siva Uday Sampreeth Chebolu, Franck Dernoncourt, Nedim Lipka, Thamar Solorio* - `2204.05232v2` - [abs](http://arxiv.org/abs/2204.05232v2) - [pdf](http://arxiv.org/pdf/2204.05232v2) > Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews in order to determine: a) The target entity being reviewed, b) The high-level aspect to which it belongs, and c) The sentiment expressed toward the targets and the aspects. Numerous yet scattered corpora for ABSA make it difficult for researchers to quickly identify corpora best suited for a specific ABSA subtask. This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems. Additionally, we provide an overview of the major corpora concerning the various ABSA and its subtasks and highlight several corpus features that researchers should consider when selecting a corpus. We conclude that further large-scale ABSA corpora are required. Additionally, because each corpus is constructed differently, it is time-consuming for researchers to experiment with a novel ABSA algorithm on many corpora and often employ just one or a few corpora. The field would benefit from an agreement on a data standard for ABSA corpora. Finally, we discuss the advantages and disadvantages of current collection approaches and make recommendations for future ABSA dataset gathering.
2022-09-07 03:28:48 - Taking a Language Detour: How International Migrants Speaking a Minority Language Seek COVID-Related Information in Their Host Countries - *Ge Gao, Jian Zheng, Eun Kyoung Choe, Naomi Yamashita* - `2209.02903v1` - [abs](http://arxiv.org/abs/2209.02903v1) - [pdf](http://arxiv.org/pdf/2209.02903v1) > Information seeking is crucial for people's self-care and wellbeing in times of public crises. Extensive research has investigated empirical understandings as well as technical solutions to facilitate information seeking by domestic citizens of affected regions. However, limited knowledge is established to support international migrants who need to survive a crisis in their host countries. The current paper presents an interview study with two cohorts of Chinese migrants living in Japan (N=14) and the United States (N=14). Participants reflected on their information seeking experiences during the COVID pandemic. The reflection was supplemented by two weeks of self-tracking where participants maintained records of their COVIDrelated information seeking practice. Our data indicated that participants often took language detours, or visits to Mandarin resources for information about the COVID outbreak in their host countries. They also made strategic use of the Mandarin information to perform selective reading, cross-checking, and contextualized interpretation of COVID-related information in Japanese or English. While such practices enhanced participants' perceived effectiveness of COVID-related information gathering and sensemaking, they disadvantaged people through sometimes incognizant ways. Further, participants lacked the awareness or preference to review migrant-oriented information that was issued by the host country's public authorities despite its availability. Building upon these findings, we discussed solutions to improve international migrants' COVID-related information seeking in their non-native language and cultural environment. We advocated inclusive crisis infrastructures that would engage people with diverse levels of local language fluency, information literacy, and experience in leveraging public services.
2022-09-07 07:00:13 - From Theories on Styles to their Transfer in Text: Bridging the Gap with a Hierarchical Survey - *Enrica Troiano, Aswathy Velutharambath, Roman Klinger* - `2110.15871v5` - [abs](http://arxiv.org/abs/2110.15871v5) - [pdf](http://arxiv.org/pdf/2110.15871v5) > Humans are naturally endowed with the ability to write in a particular style. They can, for instance, re-phrase a formal letter in an informal way, convey a literal message with the use of figures of speech or edit a novel by mimicking the style of some well-known authors. Automating this form of creativity constitutes the goal of style transfer. As a natural language generation task, style transfer aims at rewriting existing texts, and specifically, it creates paraphrases that exhibit some desired stylistic attributes. From a practical perspective, it envisions beneficial applications, like chatbots that modulate their communicative style to appear empathetic, or systems that automatically simplify technical articles for a non-expert audience. Several style-aware paraphrasing methods have attempted to tackle style transfer. A handful of surveys give a methodological overview of the field, but they do not support researchers to focus on specific styles. With this paper, we aim at providing a comprehensive discussion of the styles that have received attention in the transfer task. We organize them in a hierarchy, highlighting the challenges for the definition of each of them, and pointing out gaps in the current research landscape. The hierarchy comprises two main groups. One encompasses styles that people modulate arbitrarily, along the lines of registers and genres. The other group corresponds to unintentionally expressed styles, due to an author's personal characteristics. Hence, our review shows how these groups relate to one another, and where specific styles, including some that have not yet been explored, belong in the hierarchy. Moreover, we summarize the methods employed for different stylistic families, hinting researchers towards those that would be the most fitting for future research.
2022-09-07 08:27:10 - A Survey on Automated Diagnosis of Alzheimer's Disease Using Optical Coherence Tomography and Angiography - *Yasemin Turkan, F. Boray Tek* - `2209.03354v1` - [abs](http://arxiv.org/abs/2209.03354v1) - [pdf](http://arxiv.org/pdf/2209.03354v1) > Retinal optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA) are promising tools for the (early) diagnosis of Alzheimer's disease (AD). These non-invasive imaging techniques are cost-effective and more accessible than alternative neuroimaging tools. However, interpreting and classifying multi-slice scans produced by OCT devices is time-consuming and challenging even for trained practitioners. There are surveys on machine learning and deep learning approaches concerning the automated analysis of OCT scans for various diseases such as glaucoma. However, the current literature lacks an extensive survey on the diagnosis of Alzheimer's disease or cognitive impairment using OCT or OCTA. This has motivated us to do a comprehensive survey aimed at machine/deep learning scientists or practitioners who require an introduction to the problem. The paper contains 1) an introduction to the medical background of Alzheimer's Disease and Cognitive Impairment and their diagnosis using OCT and OCTA imaging modalities, 2) a review of various technical proposals for the problem and the sub-problems from an automated analysis perspective, 3) a systematic review of the recent deep learning studies and available OCT/OCTA datasets directly aimed at the diagnosis of Alzheimer's Disease and Cognitive Impairment. For the latter, we used Publish or Perish Software to search for the relevant studies from various sources such as Scopus, PubMed, and Web of Science. We followed the PRISMA approach to screen an initial pool of 3073 references and determined ten relevant studies (N=10, out of 3073) that directly targeted AD diagnosis. We identified the lack of open OCT/OCTA datasets (about Alzheimer's disease) as the main issue that is impeding the progress in the field.
2022-09-07 09:09:33 - Biblio-Analysis of Cohort Intelligence (CI) Algorithm and its allied applications from Scopus and Web of Science Perspective - *Ishaan Kale, Rahul Joshi, Kalyani Kadam* - `2209.03009v1` - [abs](http://arxiv.org/abs/2209.03009v1) - [pdf](http://arxiv.org/pdf/2209.03009v1) > Cohort Intelligence or CI is one of its kind of novel optimization algorithm. Since its inception, in a very short span it is applied successfully in various domains and its results are observed to be effectual in contrast to algorithm of its kind. Till date, there is no such type of bibliometric analysis carried out on CI and its related applications. So, this research paper in a way will be an ice breaker for those who want to take up CI to a new level. In this research papers, CI publications available in Scopus are analyzed through graphs, networked diagrams about authors, source titles, keywords over the years, journals over the time. In a way this bibliometric paper showcase CI, its applications and detail outs systematic review in terms its bibliometric details.
2022-09-07 11:28:50 - Plant Species Classification Using Transfer Learning by Pretrained Classifier VGG-19 - *Thiru Siddharth, Bhupendra Singh Kirar, Dheeraj Kumar Agrawal* - `2209.03076v1` - [abs](http://arxiv.org/abs/2209.03076v1) - [pdf](http://arxiv.org/pdf/2209.03076v1) > Deep learning is currently the most important branch of machine learning, with applications in speech recognition, computer vision, image classification, and medical imaging analysis. Plant recognition is one of the areas where image classification can be used to identify plant species through their leaves. Botanists devote a significant amount of time to recognizing plant species by personally inspecting. This paper describes a method for dissecting color images of Swedish leaves and identifying plant species. To achieve higher accuracy, the task is completed using transfer learning with the help of pre-trained classifier VGG-19. The four primary processes of classification are image preprocessing, image augmentation, feature extraction, and recognition, which are performed as part of the overall model evaluation. The VGG-19 classifier grasps the characteristics of leaves by employing pre-defined hidden layers such as convolutional layers, max pooling layers, and fully connected layers, and finally uses the soft-max layer to generate a feature representation for all plant classes. The model obtains knowledge connected to aspects of the Swedish leaf dataset, which contains fifteen tree classes, and aids in predicting the proper class of an unknown plant with an accuracy of 99.70% which is higher than previous research works reported.
2022-09-07 11:29:30 - Tackling problems, harvesting benefits -- A systematic review of the regulatory debate around AI - *Anja Folberth, Jutta Jahnel, Jascha Bareis, Carsten Orwat, Christian Wadephul* - `2209.05468v1` - [abs](http://arxiv.org/abs/2209.05468v1) - [pdf](http://arxiv.org/pdf/2209.05468v1) > How to integrate an emerging and all-pervasive technology such as AI into the structures and operations of our society is a question of contemporary politics, science and public debate. It has produced a considerable amount of international academic literature from different disciplines. This article analyzes the academic debate around the regulation of artificial intelligence (AI). The systematic review comprises a sample of 73 peer-reviewed journal articles published between January 1st, 2016, and December 31st, 2020. The analysis concentrates on societal risks and harms, questions of regulatory responsibility, and possible adequate policy frameworks, including risk-based and principle-based approaches. The main interests are proposed regulatory approaches and instruments. Various forms of interventions such as bans, approvals, standard-setting, and disclosure are presented. The assessments of the included papers indicate the complexity of the field, which shows its prematurity and the remaining lack of clarity. By presenting a structured analysis of the academic debate, we contribute both empirically and conceptually to a better understanding of the nexus of AI and regulation and the underlying normative decisions. A comparison of the scientific proposals with the proposed European AI regulation illustrates the specific approach of the regulation, its strengths and weaknesses.
2022-09-07 14:02:16 - Explainable Artificial Intelligence to Detect Image Spam Using Convolutional Neural Network - *Zhibo Zhang, Ernesto Damiani, Hussam Al Hamadi, Chan Yeob Yeun, Fatma Taher* - `2209.03166v1` - [abs](http://arxiv.org/abs/2209.03166v1) - [pdf](http://arxiv.org/pdf/2209.03166v1) > Image spam threat detection has continually been a popular area of research with the internet's phenomenal expansion. This research presents an explainable framework for detecting spam images using Convolutional Neural Network(CNN) algorithms and Explainable Artificial Intelligence (XAI) algorithms. In this work, we use CNN model to classify image spam respectively whereas the post-hoc XAI methods including Local Interpretable Model Agnostic Explanation (LIME) and Shapley Additive Explanations (SHAP) were deployed to provide explanations for the decisions that the black-box CNN models made about spam image detection. We train and then evaluate the performance of the proposed approach on a 6636 image dataset including spam images and normal images collected from three different publicly available email corpora. The experimental results show that the proposed framework achieved satisfactory detection results in terms of different performance metrics whereas the model-independent XAI algorithms could provide explanations for the decisions of different models which could be utilized for comparison for the future study.
2022-09-07 16:11:31 - VulCurator: A Vulnerability-Fixing Commit Detector - *Truong Giang Nguyen, Thanh Le-Cong, Hong Jin Kang, Xuan-Bach D. Le, David Lo* - `2209.03260v1` - [abs](http://arxiv.org/abs/2209.03260v1) - [pdf](http://arxiv.org/pdf/2209.03260v1) > Open-source software (OSS) vulnerability management process is important nowadays, as the number of discovered OSS vulnerabilities is increasing over time. Monitoring vulnerability-fixing commits is a part of the standard process to prevent vulnerability exploitation. Manually detecting vulnerability-fixing commits is, however, time consuming due to the possibly large number of commits to review. Recently, many techniques have been proposed to automatically detect vulnerability-fixing commits using machine learning. These solutions either: (1) did not use deep learning, or (2) use deep learning on only limited sources of information. This paper proposes VulCurator, a tool that leverages deep learning on richer sources of information, including commit messages, code changes and issue reports for vulnerability-fixing commit classifica- tion. Our experimental results show that VulCurator outperforms the state-of-the-art baselines up to 16.1% in terms of F1-score. VulCurator tool is publicly available at https://github.com/ntgiang71096/VFDetector and https://zenodo.org/record/7034132#.Yw3MN-xBzDI, with a demo video at https://youtu.be/uMlFmWSJYOE.
2022-09-07 16:59:03 - Geometric multimodal representation learning - *Yasha Ektefaie, George Dasoulas, Ayush Noori, Maha Farhat, Marinka Zitnik* - `2209.03299v1` - [abs](http://arxiv.org/abs/2209.03299v1) - [pdf](http://arxiv.org/pdf/2209.03299v1) > Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
2022-09-07 18:33:45 - A Survey of Neural Trees - *Haoling Li, Jie Song, Mengqi Xue, Haofei Zhang, Jingwen Ye, Lechao Cheng, Mingli Song* - `2209.03415v1` - [abs](http://arxiv.org/abs/2209.03415v1) - [pdf](http://arxiv.org/pdf/2209.03415v1) > Neural networks (NNs) and decision trees (DTs) are both popular models of machine learning, yet coming with mutually exclusive advantages and limitations. To bring the best of the two worlds, a variety of approaches are proposed to integrate NNs and DTs explicitly or implicitly. In this survey, these approaches are organized in a school which we term as neural trees (NTs). This survey aims to present a comprehensive review of NTs and attempts to identify how they enhance the model interpretability. We first propose a thorough taxonomy of NTs that expresses the gradual integration and co-evolution of NNs and DTs. Afterward, we analyze NTs in terms of their interpretability and performance, and suggest possible solutions to the remaining challenges. Finally, this survey concludes with a discussion about other considerations like conditional computation and promising directions towards this field. A list of papers reviewed in this survey, along with their corresponding codes, is available at: https://github.com/zju-vipa/awesome-neural-trees
2022-09-07 20:20:53 - Extend and Explain: Interpreting Very Long Language Models - *Joel Stremmel, Brian L. Hill, Jeffrey Hertzberg, Jaime Murillo, Llewelyn Allotey, Eran Halperin* - `2209.01174v2` - [abs](http://arxiv.org/abs/2209.01174v2) - [pdf](http://arxiv.org/pdf/2209.01174v2) > While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse-attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and validate our approach with a blind review by two clinicians. Our method identifies about 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs. MSP is particularly well-suited to long LMs but can be applied to any text classifier. We provide a general implementation of MSP.
2022-09-08 02:16:43 - Image Feature Information Extraction for Interest Point Detection: A Review - *Junfeng Jing, Tian Gao, Weichuan Zhang, Yongsheng Gao, Changming Sun* - `2106.07929v5` - [abs](http://arxiv.org/abs/2106.07929v5) - [pdf](http://arxiv.org/pdf/2106.07929v5) > Interest point detection is one of the most fundamental and critical problems in computer vision and image processing. In this paper, we carry out a comprehensive review on image feature information (IFI) extraction techniques for interest point detection. To systematically introduce how the existing interest point detection methods extract IFI from an input image, we propose a taxonomy of the IFI extraction techniques for interest point detection. According to this taxonomy, we discuss different types of IFI extraction techniques for interest point detection. Furthermore, we identify the main unresolved issues related to the existing IFI extraction techniques for interest point detection and any interest point detection methods that have not been discussed before. The existing popular datasets and evaluation standards are provided and the performances for eighteen state-of-the-art approaches are evaluated and discussed. Moreover, future research directions on IFI extraction techniques for interest point detection are elaborated.
2022-09-08 03:05:54 - Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey - *Ferdinando Fioretto, Cuong Tran, Pascal Van Hentenryck, Keyu Zhu* - `2202.08187v2` - [abs](http://arxiv.org/abs/2202.08187v2) - [pdf](http://arxiv.org/pdf/2202.08187v2) > This paper surveys recent work in the intersection of differential privacy (DP) and fairness. It reviews the conditions under which privacy and fairness may have aligned or contrasting goals, analyzes how and why DP may exacerbate bias and unfairness in decision problems and learning tasks, and describes available mitigation measures for the fairness issues arising in DP systems. The survey provides a unified understanding of the main challenges and potential risks arising when deploying privacy-preserving machine-learning or decisions-making tasks under a fairness lens.
2022-09-08 04:17:32 - Depression Symptoms Modelling from Social Media Text: An Active Learning Approach - *Nawshad Farruque, Randy Goebel, Sudhakar Sivapalan, Osmar Zaiane* - `2209.02765v2` - [abs](http://arxiv.org/abs/2209.02765v2) - [pdf](http://arxiv.org/pdf/2209.02765v2) > A fundamental component of user-level social media language based clinical depression modelling is depression symptoms detection (DSD). Unfortunately, there does not exist any DSD dataset that reflects both the clinical insights and the distribution of depression symptoms from the samples of self-disclosed depressed population. In our work, we describe an Active Learning (AL) framework which uses an initial supervised learning model that leverages 1) a state-of-the-art large mental health forum text pre-trained language model further fine-tuned on a clinician annotated DSD dataset, 2) a Zero-Shot learning model for DSD, and couples them together to harvest depression symptoms related samples from our large self-curated Depression Tweets Repository (DTR). Our clinician annotated dataset is the largest of its kind. Furthermore, DTR is created from the samples of tweets in self-disclosed depressed users Twitter timeline from two datasets, including one of the largest benchmark datasets for user-level depression detection from Twitter. This further helps preserve the depression symptoms distribution of self-disclosed Twitter users tweets. Subsequently, we iteratively retrain our initial DSD model with the harvested data. We discuss the stopping criteria and limitations of this AL process, and elaborate the underlying constructs which play a vital role in the overall AL process. We show that we can produce a final dataset which is the largest of its kind. Furthermore, a DSD and a Depression Post Detection (DPD) model trained on it achieves significantly better accuracy than their initial version.
2022-09-08 06:08:48 - Conformal Methods for Quantifying Uncertainty in Spatiotemporal Data: A Survey - *Sophia Sun* - `2209.03580v1` - [abs](http://arxiv.org/abs/2209.03580v1) - [pdf](http://arxiv.org/pdf/2209.03580v1) > Machine learning methods are increasingly widely used in high-risk settings such as healthcare, transportation, and finance. In these settings, it is important that a model produces calibrated uncertainty to reflect its own confidence and avoid failures. In this paper we survey recent works on uncertainty quantification (UQ) for deep learning, in particular distribution-free Conformal Prediction method for its mathematical properties and wide applicability. We will cover the theoretical guarantees of conformal methods, introduce techniques that improve calibration and efficiency for UQ in the context of spatiotemporal data, and discuss the role of UQ in the context of safe decision making.
2022-09-08 08:21:18 - A Survey on Data Augmentation for Text Classification - *Markus Bayer, Marc-André Kaufhold, Christian Reuter* - `2107.03158v6` - [abs](http://arxiv.org/abs/2107.03158v6) - [pdf](http://arxiv.org/pdf/2107.03158v6) > Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization capabilities, it can also address many other challenges and problems, from overcoming a limited amount of training data, to regularizing the objective, to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation and a taxonomy for existing works, this survey is concerned with data augmentation methods for textual classification and aims to provide a concise and comprehensive overview for researchers and practitioners. Derived from the taxonomy, we divide more than 100 methods into 12 different groupings and give state-of-the-art references expounding which methods are highly promising by relating them to each other. Finally, research perspectives that may constitute a building block for future work are provided.
2022-09-08 10:12:21 - A Review on Method Entities in the Academic Literature: Extraction, Evaluation, and Application - *Yuzhuo Wang, Chengzhi Zhang, Kai Li* - `2209.03687v1` - [abs](http://arxiv.org/abs/2209.03687v1) - [pdf](http://arxiv.org/pdf/2209.03687v1) > In scientific research, the method is an indispensable means to solve scientific problems and a critical research object. With the advancement of sciences, many scientific methods are being proposed, modified, and used in academic literature. The authors describe details of the method in the abstract and body text, and key entities in academic literature reflecting names of the method are called method entities. Exploring diverse method entities in a tremendous amount of academic literature helps scholars understand existing methods, select the appropriate method for research tasks, and propose new methods. Furthermore, the evolution of method entities can reveal the development of a discipline and facilitate knowledge discovery. Therefore, this article offers a systematic review of methodological and empirical works focusing on extracting method entities from full-text academic literature and efforts to build knowledge services using these extracted method entities. Definitions of key concepts involved in this review were first proposed. Based on these definitions, we systematically reviewed the approaches and indicators to extract and evaluate method entities, with a strong focus on the pros and cons of each approach. We also surveyed how extracted method entities are used to build new applications. Finally, limitations in existing works as well as potential next steps were discussed.
2022-09-08 10:23:16 - Exploring the Distribution Regularities of User Attention and Sentiment toward Product Aspects in Online Reviews - *Chenglei Qin, Chengzhi Zhang, Yi Bu* - `2209.03690v1` - [abs](http://arxiv.org/abs/2209.03690v1) - [pdf](http://arxiv.org/pdf/2209.03690v1) > [Purpose] To better understand the online reviews and help potential consumers, businessmen, and product manufacturers effectively obtain users' evaluation on product aspects, this paper explores the distribution regularities of user attention and sentiment toward product aspects from the temporal perspective of online reviews. [Design/methodology/approach] Temporal characteristics of online reviews (purchase time, review time, and time intervals between purchase time and review time), similar attributes clustering, and attribute-level sentiment computing technologies are employed based on more than 340k smartphone reviews of three products from JD.COM (a famous online shopping platform in China) to explore the distribution regularities of user attention and sentiment toward product aspects in this article. [Findings] The empirical results show that a power-law distribution can fit user attention to product aspects, and the reviews posted in short time intervals contain more product aspects. Besides, the results show that the values of user sentiment of product aspects are significantly higher/lower in short time intervals which contribute to judging the advantages and weaknesses of a product. [Research limitations] The paper can't acquire online reviews for more products with temporal characteristics to verify the findings because of the restriction on reviews crawling by the shopping platforms. [Originality/value] This work reveals the distribution regularities of user attention and sentiment toward product aspects, which is of great significance in assisting decision-making, optimizing review presentation, and improving the shopping experience.
2022-09-08 12:07:12 - Automatic fetal fat quantification from MRI - *Netanell Avisdris, Aviad Rabinowich, Daniel Fridkin, Ayala Zilberman, Sapir Lazar, Jacky Herzlich, Zeev Hananis, Daphna Link-Sourani, Liat Ben-Sira, Liran Hiersch, Dafna Ben Bashat, Leo Joskowicz* - `2209.03748v1` - [abs](http://arxiv.org/abs/2209.03748v1) - [pdf](http://arxiv.org/pdf/2209.03748v1) > Normal fetal adipose tissue (AT) development is essential for perinatal well-being. AT, or simply fat, stores energy in the form of lipids. Malnourishment may result in excessive or depleted adiposity. Although previous studies showed a correlation between the amount of AT and perinatal outcome, prenatal assessment of AT is limited by lacking quantitative methods. Using magnetic resonance imaging (MRI), 3D fat- and water-only images of the entire fetus can be obtained from two point Dixon images to enable AT lipid quantification. This paper is the first to present a methodology for developing a deep learning based method for fetal fat segmentation based on Dixon MRI. It optimizes radiologists' manual fetal fat delineation time to produce annotated training dataset. It consists of two steps: 1) model-based semi-automatic fetal fat segmentations, reviewed and corrected by a radiologist; 2) automatic fetal fat segmentation using DL networks trained on the resulting annotated dataset. Three DL networks were trained. We show a significant improvement in segmentation times (3:38 hours to < 1 hour) and observer variability (Dice of 0.738 to 0.906) compared to manual segmentation. Automatic segmentation of 24 test cases with the 3D Residual U-Net, nn-UNet and SWIN-UNetR transformer networks yields a mean Dice score of 0.863, 0.787 and 0.856, respectively. These results are better than the manual observer variability, and comparable to automatic adult and pediatric fat segmentation. A radiologist reviewed and corrected six new independent cases segmented using the best performing network, resulting in a Dice score of 0.961 and a significantly reduced correction time of 15:20 minutes. Using these novel segmentation methods and short MRI acquisition time, whole body subcutaneous lipids can be quantified for individual fetuses in the clinic and large-cohort research.
2022-09-08 14:58:50 - A Survey on Large-Population Systems and Scalable Multi-Agent Reinforcement Learning - *Kai Cui, Anam Tahir, Gizem Ekinci, Ahmed Elshamanhory, Yannick Eich, Mengguang Li, Heinz Koeppl* - `2209.03859v1` - [abs](http://arxiv.org/abs/2209.03859v1) - [pdf](http://arxiv.org/pdf/2209.03859v1) > The analysis and control of large-population systems is of great interest to diverse areas of research and engineering, ranging from epidemiology over robotic swarms to economics and finance. An increasingly popular and effective approach to realizing sequential decision-making in multi-agent systems is through multi-agent reinforcement learning, as it allows for an automatic and model-free analysis of highly complex systems. However, the key issue of scalability complicates the design of control and reinforcement learning algorithms particularly in systems with large populations of agents. While reinforcement learning has found resounding empirical success in many scenarios with few agents, problems with many agents quickly become intractable and necessitate special consideration. In this survey, we will shed light on current approaches to tractably understanding and analyzing large-population systems, both through multi-agent reinforcement learning and through adjacent areas of research such as mean-field games, collective intelligence, or complex network theory. These classically independent subject areas offer a variety of approaches to understanding or modeling large-population systems, which may be of great use for the formulation of tractable MARL algorithms in the future. Finally, we survey potential areas of application for large-scale control and identify fruitful future applications of learning algorithms in practical systems. We hope that our survey could provide insight and future directions to junior and senior researchers in theoretical and applied sciences alike.
2022-09-08 18:15:35 - Contextualizing Artificially Intelligent Morality: A Meta-Ethnography of Top-Down, Bottom-Up, and Hybrid Models for Theoretical and Applied Ethics in Artificial Intelligence - *Jennafer S. Roberts, Laura N. Montoya* - `2204.07612v2` - [abs](http://arxiv.org/abs/2204.07612v2) - [pdf](http://arxiv.org/pdf/2204.07612v2) > In this meta-ethnography, we explore three different angles of ethical artificial intelligence (AI) design implementation including the philosophical ethical viewpoint, the technical perspective, and framing through a political lens. Our qualitative research includes a literature review that highlights the cross-referencing of these angles by discussing the value and drawbacks of contrastive top-down, bottom-up, and hybrid approaches previously published. The novel contribution to this framework is the political angle, which constitutes ethics in AI either being determined by corporations and governments and imposed through policies or law (coming from the top), or ethics being called for by the people (coming from the bottom), as well as top-down, bottom-up, and hybrid technicalities of how AI is developed within a moral construct and in consideration of its users, with expected and unexpected consequences and long-term impact in the world. There is a focus on reinforcement learning as an example of a bottom-up applied technical approach and AI ethics principles as a practical top-down approach. This investigation includes real-world case studies to impart a global perspective, as well as philosophical debate on the ethics of AI and theoretical future thought experimentation based on historical facts, current world circumstances, and possible ensuing realities.
2022-09-09 02:50:47 - PoxVerifi: An Information Verification System to Combat Monkeypox Misinformation - *Akaash Kolluri, Kami Vinton, Dhiraj Murthy* - `2209.09300v1` - [abs](http://arxiv.org/abs/2209.09300v1) - [pdf](http://arxiv.org/pdf/2209.09300v1) > Following recent outbreaks, monkeypox-related misinformation continues to rapidly spread online. This negatively impacts response strategies and disproportionately harms LGBTQ+ communities in the short-term, and ultimately undermines the overall effectiveness of public health responses. In an attempt to combat monkeypox-related misinformation, we present PoxVerifi, an open-source, extensible tool that provides a comprehensive approach to assessing the accuracy of monkeypox related claims. Leveraging information from existing fact checking sources and published World Health Organization (WHO) information, we created an open-sourced corpus of 225 rated monkeypox claims. Additionally, we trained an open-sourced BERT-based machine learning model for specifically classifying monkeypox information, which achieved 96% cross-validation accuracy. PoxVerifi is a Google Chrome browser extension designed to empower users to navigate through monkeypox-related misinformation. Specifically, PoxVerifi provides users with a comprehensive toolkit to assess the veracity of headlines on any webpage across the Internet without having to visit an external site. Users can view an automated accuracy review from our trained machine learning model, a user-generated accuracy review based on community-member votes, and have the ability to see similar, vetted, claims. Besides PoxVerifi's comprehensive approach to claim-testing, our platform provides an efficient and accessible method to crowdsource accuracy ratings on monkeypox related-claims, which can be aggregated to create new labeled misinformation datasets.
2022-09-09 02:54:26 - MassMIND: Massachusetts Maritime INfrared Dataset - *Shailesh Nirgudkar, Michael DeFilippo, Michael Sacarny, Michael Benjamin, Paul Robinette* - `2209.04097v1` - [abs](http://arxiv.org/abs/2209.04097v1) - [pdf](http://arxiv.org/pdf/2209.04097v1) > Recent advances in deep learning technology have triggered radical progress in the autonomy of ground vehicles. Marine coastal Autonomous Surface Vehicles (ASVs) that are regularly used for surveillance, monitoring and other routine tasks can benefit from this autonomy. Long haul deep sea transportation activities are additional opportunities. These two use cases present very different terrains -- the first being coastal waters -- with many obstacles, structures and human presence while the latter is mostly devoid of such obstacles. Variations in environmental conditions are common to both terrains. Robust labeled datasets mapping such terrains are crucial in improving the situational awareness that can drive autonomy. However, there are only limited such maritime datasets available and these primarily consist of optical images. Although, Long Wave Infrared (LWIR) is a strong complement to the optical spectrum that helps in extreme light conditions, a labeled public dataset with LWIR images does not currently exist. In this paper, we fill this gap by presenting a labeled dataset of over 2,900 LWIR segmented images captured in coastal maritime environment under diverse conditions. The images are labeled using instance segmentation and classified in seven categories -- sky, water, obstacle, living obstacle, bridge, self and background. We also evaluate this dataset across three deep learning architectures (UNet, PSPNet, DeepLabv3) and provide detailed analysis of its efficacy. While the dataset focuses on the coastal terrain it can equally help deep sea use cases. Such terrain would have less traffic, and the classifier trained on cluttered environment would be able to handle sparse scenes effectively. We share this dataset with the research community with the hope that it spurs new scene understanding capabilities in the maritime environment.
2022-09-09 02:58:36 - Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence - *Kacper Sokol, Peter Flach* - `2112.14466v2` - [abs](http://arxiv.org/abs/2112.14466v2) - [pdf](http://arxiv.org/pdf/2112.14466v2) > Explainable artificial intelligence and interpretable machine learning are research domains growing in importance. Yet, the underlying concepts remain somewhat elusive and lack generally agreed definitions. While recent inspiration from social sciences has refocused the work on needs and expectations of human recipients, the field still misses a concrete conceptualisation. We take steps towards addressing this challenge by reviewing the philosophical and social foundations of human explainability, which we then translate into the technological realm. In particular, we scrutinise the notion of algorithmic black boxes and the spectrum of understanding determined by explanatory processes and explainees' background knowledge. This approach allows us to define explainability as (logical) reasoning applied to transparent insights (into, possibly black-box, predictive systems) interpreted under background knowledge and placed within a specific context -- a process that engenders understanding in a selected group of explainees. We then employ this conceptualisation to revisit strategies for evaluating explainability as well as the much disputed trade-off between transparency and predictive power, including its implications for ante-hoc and post-hoc techniques along with fairness and accountability established by explainability. We furthermore discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability, with a particular focus on explainees, contrastive statements and explanatory processes. Our discussion reconciles and complements current research to help better navigate open questions -- rather than attempting to address any individual issue -- thus laying a solid foundation for a grounded discussion and future progress of explainable artificial intelligence and interpretable machine learning.
2022-09-09 03:12:45 - Controllable Data Generation by Deep Learning: A Review - *Shiyu Wang, Yuanqi Du, Xiaojie Guo, Bo Pan, Liang Zhao* - `2207.09542v3` - [abs](http://arxiv.org/abs/2207.09542v3) - [pdf](http://arxiv.org/pdf/2207.09542v3) > Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.
2022-09-09 07:40:11 - Metaverse for Healthcare: A Survey on Potential Applications, Challenges and Future Directions - *Rajeswari Chengoden, Nancy Victor, Thien Huynh-The, Gokul Yenduri, Rutvij H. Jhaveri, Mamoun Alazab, Sweta Bhattacharya, Pawan Hegde, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu* - `2209.04160v1` - [abs](http://arxiv.org/abs/2209.04160v1) - [pdf](http://arxiv.org/pdf/2209.04160v1) > The rapid progress in digitalization and automation have led to an accelerated growth in healthcare, generating novel models that are creating new channels for rendering treatment with reduced cost. The Metaverse is an emerging technology in the digital space which has huge potential in healthcare, enabling realistic experiences to the patients as well as the medical practitioners. The Metaverse is a confluence of multiple enabling technologies such as artificial intelligence, virtual reality, augmented reality, internet of medical devices, robotics, quantum computing, etc. through which new directions for providing quality healthcare treatment and services can be explored. The amalgamation of these technologies ensures immersive, intimate and personalized patient care. It also provides adaptive intelligent solutions that eliminates the barriers between healthcare providers and receivers. This article provides a comprehensive review of the Metaverse for healthcare, emphasizing on the state of the art, the enabling technologies for adopting the Metaverse for healthcare, the potential applications and the related projects. The issues in the adaptation of the Metaverse for healthcare applications are also identified and the plausible solutions are highlighted as part of future research directions.
2022-09-09 11:59:42 - Location-Routing Planning for Last-Mile Deliveries Using Mobile Parcel Lockers: A Hybrid Q-Learning Network Approach - *Yubin Liu, Qiming Ye, Jose Escribano-Macias, Yuxiang Feng, Panagiotis Angeloudis* - `2209.04265v1` - [abs](http://arxiv.org/abs/2209.04265v1) - [pdf](http://arxiv.org/pdf/2209.04265v1) > Mobile parcel lockers (MPLs) have been recently proposed by logistics operators as a technology that could help reduce traffic congestion and operational costs in urban freight distribution. Given their ability to relocate throughout their area of deployment, they hold the potential to improve customer accessibility and convenience. In this study, we formulate the Mobile Parcel Locker Problem (MPLP), a special case of the Location-Routing Problem (LRP) which determines the optimal stopover location for MPLs throughout the day and plans corresponding delivery routes. A Hybrid Q-Learning-Network-based Method (HQM) is developed to resolve the computational complexity of the resulting large problem instances while escaping local optima. In addition, the HQM is integrated with global and local search mechanisms to resolve the dilemma of exploration and exploitation faced by classic reinforcement learning (RL) methods. We examine the performance of HQM under different problem sizes (up to 200 nodes) and benchmarked it against the Genetic Algorithm (GA). Our results indicate that the average reward obtained by HQM is 1.96 times greater than GA, which demonstrates that HQM has a better optimisation ability. Finally, we identify critical factors that contribute to fleet size requirements, travel distances, and service delays. Our findings outline that the efficiency of MPLs is mainly contingent on the length of time windows and the deployment of MPL stopovers.
2022-09-09 13:34:51 - Robust-by-Design Classification via Unitary-Gradient Neural Networks - *Fabio Brau, Giulio Rossolini, Alessandro Biondi, Giorgio Buttazzo* - `2209.04293v1` - [abs](http://arxiv.org/abs/2209.04293v1) - [pdf](http://arxiv.org/pdf/2209.04293v1) > The use of neural networks in safety-critical systems requires safe and robust models, due to the existence of adversarial attacks. Knowing the minimal adversarial perturbation of any input x, or, equivalently, knowing the distance of x from the classification boundary, allows evaluating the classification robustness, providing certifiable predictions. Unfortunately, state-of-the-art techniques for computing such a distance are computationally expensive and hence not suited for online applications. This work proposes a novel family of classifiers, namely Signed Distance Classifiers (SDCs), that, from a theoretical perspective, directly output the exact distance of x from the classification boundary, rather than a probability score (e.g., SoftMax). SDCs represent a family of robust-by-design classifiers. To practically address the theoretical requirements of a SDC, a novel network architecture named Unitary-Gradient Neural Network is presented. Experimental results show that the proposed architecture approximates a signed distance classifier, hence allowing an online certifiable classification of x at the cost of a single inference.
2022-09-09 14:51:13 - Bridging the Gap: Differentially Private Equivariant Deep Learning for Medical Image Analysis - *Florian A. Hölzl, Daniel Rueckert, Georgios Kaissis* - `2209.04338v1` - [abs](http://arxiv.org/abs/2209.04338v1) - [pdf](http://arxiv.org/pdf/2209.04338v1) > Machine learning with formal privacy-preserving techniques like Differential Privacy (DP) allows one to derive valuable insights from sensitive medical imaging data while promising to protect patient privacy, but it usually comes at a sharp privacy-utility trade-off. In this work, we propose to use steerable equivariant convolutional networks for medical image analysis with DP. Their improved feature quality and parameter efficiency yield remarkable accuracy gains, narrowing the privacy-utility gap.
2022-09-09 16:46:37 - Trust Calibration as a Function of the Evolution of Uncertainty in Knowledge Generation: A Survey - *Joshua Boley, Maoyuan Sun* - `2209.04388v1` - [abs](http://arxiv.org/abs/2209.04388v1) - [pdf](http://arxiv.org/pdf/2209.04388v1) > User trust is a crucial consideration in designing robust visual analytics systems that can guide users to reasonably sound conclusions despite inevitable biases and other uncertainties introduced by the human, the machine, and the data sources which paint the canvas upon which knowledge emerges. A multitude of factors emerge upon studied consideration which introduce considerable complexity and exacerbate our understanding of how trust relationships evolve in visual analytics systems, much as they do in intelligent sociotechnical systems. A visual analytics system, however, does not by its nature provoke exactly the same phenomena as its simpler cousins, nor are the phenomena necessarily of the same exact kind. Regardless, both application domains present the same root causes from which the need for trustworthiness arises: Uncertainty and the assumption of risk. In addition, visual analytics systems, even more than the intelligent systems which (traditionally) tend to be closed to direct human input and direction during processing, are influenced by a multitude of cognitive biases that further exacerbate an accounting of the uncertainties that may afflict the user's confidence, and ultimately trust in the system. In this article we argue that accounting for the propagation of uncertainty from data sources all the way through extraction of information and hypothesis testing is necessary to understand how user trust in a visual analytics system evolves over its lifecycle, and that the analyst's selection of visualization parameters affords us a simple means to capture the interactions between uncertainty and cognitive bias as a function of the attributes of the search tasks the analyst executes while evaluating explanations. We sample a broad cross-section of the literature from visual analytics, human cognitive theory, and uncertainty, and attempt to synthesize a useful perspective.
2022-09-09 17:48:36 - Investigation of a Machine learning methodology for the SKA pulsar search pipeline - *Shashank Sanjay Bhat, Prabu Thiagaraj, Ben Stappers, Atul Ghalame, Snehanshu Saha, T. S. B Sudarshan, Zaffirah Hosenie* - `2209.04430v1` - [abs](http://arxiv.org/abs/2209.04430v1) - [pdf](http://arxiv.org/pdf/2209.04430v1) > The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called Mask R-CNN to detect candidate signatures in the SKA pulsar search pipeline. We have trained the Mask R-CNN model to detect candidate images. A custom annotation tool was developed to mark the regions of interest in large datasets efficiently. We have successfully demonstrated this algorithm by detecting candidate signatures on a simulation dataset. The paper presents details of this work with a highlight on the future prospects.
2022-09-09 19:37:05 - General Place Recognition Survey: Towards the Real-world Autonomy Age - *Peng Yin, Shiqi Zhao, Ivan Cisneros, Abulikemu Abuduweili, Guoquan Huang, Micheal Milford, Changliu Liu, Howie Choset, Sebastian Scherer* - `2209.04497v1` - [abs](http://arxiv.org/abs/2209.04497v1) - [pdf](http://arxiv.org/pdf/2209.04497v1) > Place recognition is the fundamental module that can assist Simultaneous Localization and Mapping (SLAM) in loop-closure detection and re-localization for long-term navigation. The place recognition community has made astonishing progress over the last $20$ years, and this has attracted widespread research interest and application in multiple fields such as computer vision and robotics. However, few methods have shown promising place recognition performance in complex real-world scenarios, where long-term and large-scale appearance changes usually result in failures. Additionally, there is a lack of an integrated framework amongst the state-of-the-art methods that can handle all of the challenges in place recognition, which include appearance changes, viewpoint differences, robustness to unknown areas, and efficiency in real-world applications. In this work, we survey the state-of-the-art methods that target long-term localization and discuss future directions and opportunities. We start by investigating the formulation of place recognition in long-term autonomy and the major challenges in real-world environments. We then review the recent works in place recognition for different sensor modalities and current strategies for dealing with various place recognition challenges. Finally, we review the existing datasets for long-term localization and introduce our datasets and evaluation API for different approaches. This paper can be a tutorial for researchers new to the place recognition community and those who care about long-term robotics autonomy. We also provide our opinion on the frequently asked question in robotics: Do robots need accurate localization for long-term autonomy? A summary of this work and our datasets and evaluation API is publicly available to the robotics community at: https://github.com/MetaSLAM/GPRS.
2022-09-09 20:21:03 - Calibrating Segmentation Networks with Margin-based Label Smoothing - *Balamurali Murugesan, Bingyuan Liu, Adrian Galdran, Ismail Ben Ayed, Jose Dolz* - `2209.09641v1` - [abs](http://arxiv.org/abs/2209.09641v1) - [pdf](http://arxiv.org/pdf/2209.09641v1) > Despite the undeniable progress in visual recognition tasks fueled by deep neural networks, there exists recent evidence showing that these models are poorly calibrated, resulting in over-confident predictions. The standard practices of minimizing the cross entropy loss during training promote the predicted softmax probabilities to match the one-hot label assignments. Nevertheless, this yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations, which exacerbates the miscalibration problem. Recent observations from the classification literature suggest that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. Despite these findings, the impact of these losses in the relevant task of calibrating medical image segmentation networks remains unexplored. In this work, we provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian term) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of public medical image segmentation benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, whereas the discriminative performance is also improved.
2022-09-10 02:51:49 - Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey - *Tianxu Li, Kun Zhu, Nguyen Cong Luong, Dusit Niyato, Qihui Wu, Yang Zhang, Bing Chen* - `2110.13484v3` - [abs](http://arxiv.org/abs/2110.13484v3) - [pdf](http://arxiv.org/pdf/2110.13484v3) > Future Internet involves several emerging technologies such as 5G and beyond 5G networks, vehicular networks, unmanned aerial vehicle (UAV) networks, and Internet of Things (IoTs). Moreover, future Internet becomes heterogeneous and decentralized with a large number of involved network entities. Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments. Standard learning algorithms such as single-agent Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) have been recently used to enable each network entity as an agent to learn an optimal decision-making policy adaptively through interacting with the unknown environments. However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity issue. Multi-agent Reinforcement Learning (MARL) allows each network entity to learn its optimal policy by observing not only the environments, but also other entities' policies. As a result, MARL can significantly improve the learning efficiency of the network entities, and it has been recently used to solve various issues in the emerging networks. In this paper, we thus review the applications of MARL in the emerging networks. In particular, we provide a tutorial of MARL and a comprehensive survey of applications of MARL in next generation Internet. In particular, we first introduce single-agent RL and MARL. Then, we review a number of applications of MARL to solve emerging issues in future Internet. The issues consist of network access, transmit power control, computation offloading, content caching, packet routing, trajectory design for UAV-aided networks, and network security issues.
2022-09-10 05:41:04 - Code Compliance Assessment as a Learning Problem - *Neela Sawant, Srinivasan H. Sengamedu* - `2209.04602v1` - [abs](http://arxiv.org/abs/2209.04602v1) - [pdf](http://arxiv.org/pdf/2209.04602v1) > Manual code reviews and static code analyzers are the traditional mechanisms to verify if source code complies with coding policies. However, these mechanisms are hard to scale. We formulate code compliance assessment as a machine learning (ML) problem, to take as input a natural language policy and code, and generate a prediction on the code's compliance, non-compliance, or irrelevance. This can help scale compliance classification and search for policies not covered by traditional mechanisms. We explore key research questions on ML model formulation, training data, and evaluation setup. The core idea is to obtain a joint code-text embedding space which preserves compliance relationships via the vector distance of code and policy embeddings. As there is no task-specific data, we re-interpret and filter commonly available software datasets with additional pre-training and pre-finetuning tasks that reduce the semantic gap. We benchmarked our approach on two listings of coding policies (CWE and CBP). This is a zero-shot evaluation as none of the policies occur in the training set. On CWE and CBP respectively, our tool Policy2Code achieves classification accuracies of (59%, 71%) and search MRR of (0.05, 0.21) compared to CodeBERT with classification accuracies of (37%, 54%) and MRR of (0.02, 0.02). In a user study, 24% Policy2Code detections were accepted compared to 7% for CodeBERT.
2022-09-10 17:03:34 - A Survey in Automatic Irony Processing: Linguistic, Cognitive, and Multi-X Perspectives - *Qingcheng Zeng, An-Ran Li* - `2209.04712v1` - [abs](http://arxiv.org/abs/2209.04712v1) - [pdf](http://arxiv.org/pdf/2209.04712v1) > Irony is a ubiquitous figurative language in daily communication. Previously, many researchers have approached irony from linguistic, cognitive science, and computational aspects. Recently, some progress have been witnessed in automatic irony processing due to the rapid development in deep neural models in natural language processing (NLP). In this paper, we will provide a comprehensive overview of computational irony, insights from linguistic theory and cognitive science, as well as its interactions with downstream NLP tasks and newly proposed multi-X irony processing perspectives.
2022-09-10 22:00:30 - Diffusion Models in Vision: A Survey - *Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah* - `2209.04747v1` - [abs](http://arxiv.org/abs/2209.04747v1) - [pdf](http://arxiv.org/pdf/2209.04747v1) > Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.
2022-09-11 03:52:27 - MAiVAR: Multimodal Audio-Image and Video Action Recognizer - *Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam, Naveed Akhtar* - `2209.04780v1` - [abs](http://arxiv.org/abs/2209.04780v1) - [pdf](http://arxiv.org/pdf/2209.04780v1) > Currently, action recognition is predominately performed on video data as processed by CNNs. We investigate if the representation process of CNNs can also be leveraged for multimodal action recognition by incorporating image-based audio representations of actions in a task. To this end, we propose Multimodal Audio-Image and Video Action Recognizer (MAiVAR), a CNN-based audio-image to video fusion model that accounts for video and audio modalities to achieve superior action recognition performance. MAiVAR extracts meaningful image representations of audio and fuses it with video representation to achieve better performance as compared to both modalities individually on a large-scale action recognition dataset.
2022-09-11 06:12:28 - Multiple Object Tracking in Recent Times: A Literature Review - *Mk Bashar, Samia Islam, Kashifa Kawaakib Hussain, Md. Bakhtiar Hasan, A. B. M. Ashikur Rahman, Md. Hasanul Kabir* - `2209.04796v1` - [abs](http://arxiv.org/abs/2209.04796v1) - [pdf](http://arxiv.org/pdf/2209.04796v1) > Multiple object tracking gained a lot of interest from researchers in recent years, and it has become one of the trending problems in computer vision, especially with the recent advancement of autonomous driving. MOT is one of the critical vision tasks for different issues like occlusion in crowded scenes, similar appearance, small object detection difficulty, ID switching, etc. To tackle these challenges, as researchers tried to utilize the attention mechanism of transformer, interrelation of tracklets with graph convolutional neural network, appearance similarity of objects in different frames with the siamese network, they also tried simple IOU matching based CNN network, motion prediction with LSTM. To take these scattered techniques under an umbrella, we have studied more than a hundred papers published over the last three years and have tried to extract the techniques that are more focused on by researchers in recent times to solve the problems of MOT. We have enlisted numerous applications, possibilities, and how MOT can be related to real life. Our review has tried to show the different perspectives of techniques that researchers used overtimes and give some future direction for the potential researchers. Moreover, we have included popular benchmark datasets and metrics in this review.
2022-09-11 21:53:43 - Unsupervised Learning of 3D Scene Flow with 3D Odometry Assistance - *Guangming Wang, Zhiheng Feng, Chaokang Jiang, Hesheng Wang* - `2209.04945v1` - [abs](http://arxiv.org/abs/2209.04945v1) - [pdf](http://arxiv.org/pdf/2209.04945v1) > Scene flow represents the 3D motion of each point in the scene, which explicitly describes the distance and the direction of each point's movement. Scene flow estimation is used in various applications such as autonomous driving fields, activity recognition, and virtual reality fields. As it is challenging to annotate scene flow with ground truth for real-world data, this leaves no real-world dataset available to provide a large amount of data with ground truth for scene flow estimation. Therefore, many works use synthesized data to pre-train their network and real-world LiDAR data to finetune. Unlike the previous unsupervised learning of scene flow in point clouds, we propose to use odometry information to assist the unsupervised learning of scene flow and use real-world LiDAR data to train our network. Supervised odometry provides more accurate shared cost volume for scene flow. In addition, the proposed network has mask-weighted warp layers to get a more accurate predicted point cloud. The warp operation means applying an estimated pose transformation or scene flow to a source point cloud to obtain a predicted point cloud and is the key to refining scene flow from coarse to fine. When performing warp operations, the points in different states use different weights for the pose transformation and scene flow transformation. We classify the states of points as static, dynamic, and occluded, where the static masks are used to divide static and dynamic points, and the occlusion masks are used to divide occluded points. The mask-weighted warp layer indicates that static masks and occlusion masks are used as weights when performing warp operations. Our designs are proved to be effective in ablation experiments. The experiment results show the promising prospect of an odometry-assisted unsupervised learning method for 3D scene flow in real-world data.
2022-09-12 12:49:14 - A Survey of Machine Unlearning - *Thanh Tam Nguyen, Thanh Trung Huynh, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, Quoc Viet Hung Nguyen* - `2209.02299v4` - [abs](http://arxiv.org/abs/2209.02299v4) - [pdf](http://arxiv.org/pdf/2209.02299v4) > Computer systems hold a large amount of personal data over decades. On the one hand, such data abundance allows breakthroughs in artificial intelligence (AI), especially machine learning (ML) models. On the other hand, it can threaten the privacy of users and weaken the trust between humans and AI. Recent regulations require that private information about a user can be removed from computer systems in general and from ML models in particular upon request (e.g. the "right to be forgotten"). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often "remember" the old data. Existing adversarial attacks proved that we can learn private membership or attributes of the training data from the trained models. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to solve the problem completely due to the lack of common frameworks and resources. In this survey paper, we seek to provide a thorough investigation of machine unlearning in its definitions, scenarios, mechanisms, and applications. Specifically, as a categorical collection of state-of-the-art research, we hope to provide a broad reference for those seeking a primer on machine unlearning and its various formulations, design requirements, removal requests, algorithms, and uses in a variety of ML applications. Furthermore, we hope to outline key findings and trends in the paradigm as well as highlight new areas of research that have yet to see the application of machine unlearning, but could nonetheless benefit immensely. We hope this survey provides a valuable reference for ML researchers as well as those seeking to innovate privacy technologies. Our resources are at https://github.com/tamlhp/awesome-machine-unlearning.
2022-09-12 13:11:25 - A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding - *Tin Lai* - `2209.05222v1` - [abs](http://arxiv.org/abs/2209.05222v1) - [pdf](http://arxiv.org/pdf/2209.05222v1) > Simultaneous Localisation and Mapping (SLAM) is one of the fundamental problems in autonomous mobile robots where a robot needs to reconstruct a previously unseen environment while simultaneously localising itself with respect to the map. In particular, Visual-SLAM uses various sensors from the mobile robot for collecting and sensing a representation of the map. Traditionally, geometric model-based techniques were used to tackle the SLAM problem, which tends to be error-prone under challenging environments. Recent advancements in computer vision, such as deep learning techniques, have provided a data-driven approach to tackle the Visual-SLAM problem. This review summarises recent advancements in the Visual-SLAM domain using various learning-based methods. We begin by providing a concise overview of the geometric model-based approaches, followed by technical reviews on the current paradigms in SLAM. Then, we present the various learning-based approaches to collecting sensory inputs from mobile robots and performing scene understanding. The current paradigms in deep-learning-based semantic understanding are discussed and placed under the context of Visual-SLAM. Finally, we discuss challenges and further opportunities in the direction of learning-based approaches in Visual-SLAM.
2022-09-12 14:38:20 - Leveraging Artificial Intelligence Techniques for Smart Palm Tree Detection: A Decade Systematic Review - *Yosra Hajjaji, Wadii Boulila, Imed Riadh Farah* - `2209.05282v1` - [abs](http://arxiv.org/abs/2209.05282v1) - [pdf](http://arxiv.org/pdf/2209.05282v1) > Over the past few years, total financial investment in the agricultural sector has increased substantially. Palm tree is important for many countries' economies, particularly in northern Africa and the Middle East. Monitoring in terms of detection and counting palm trees provides useful information for various stakeholders; it helps in yield estimation and examination to ensure better crop quality and prevent pests, diseases, better irrigation, and other potential threats. Despite their importance, this information is still challenging to obtain. This study systematically reviews research articles between 2011 and 2021 on artificial intelligence (AI) technology for smart palm tree detection. A systematic review (SR) was performed using the PRISMA approach based on a four-stage selection process. Twenty-two articles were included for the synthesis activity reached from the search strategy alongside the inclusion criteria in order to answer to two main research questions. The study's findings reveal patterns, relationships, networks, and trends in applying artificial intelligence in palm tree detection over the last decade. Despite the good results in most of the studies, the effective and efficient management of large-scale palm plantations is still a challenge. In addition, countries whose economies strongly related to intelligent palm services, especially in North Africa, should give more attention to this kind of study. The results of this research could benefit both the research community and stakeholders.
2022-09-12 14:56:14 - A Review of Challenges in Machine Learning based Automated Hate Speech Detection - *Abhishek Velankar, Hrushikesh Patil, Raviraj Joshi* - `2209.05294v1` - [abs](http://arxiv.org/abs/2209.05294v1) - [pdf](http://arxiv.org/pdf/2209.05294v1) > The spread of hate speech on social media space is currently a serious issue. The undemanding access to the enormous amount of information being generated on these platforms has led people to post and react with toxic content that originates violence. Though efforts have been made toward detecting and restraining such content online, it is still challenging to identify it accurately. Deep learning based solutions have been at the forefront of identifying hateful content. However, the factors such as the context-dependent nature of hate speech, the intention of the user, undesired biases, etc. make this process overcritical. In this work, we deeply explore a wide range of challenges in automatic hate speech detection by presenting a hierarchical organization of these problems. We focus on challenges faced by machine learning or deep learning based solutions to hate speech identification. At the top level, we distinguish between data level, model level, and human level challenges. We further provide an exhaustive analysis of each level of the hierarchy with examples. This survey will help researchers to design their solutions more efficiently in the domain of hate speech detection.
2022-09-12 15:05:41 - Deep Convolutional Pooling Transformer for Deepfake Detection - *Tianyi Wang, Harry Cheng, Kam Pui Chow, Liqiang Nie* - `2209.05299v1` - [abs](http://arxiv.org/abs/2209.05299v1) - [pdf](http://arxiv.org/pdf/2209.05299v1) > Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between the real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improving the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance the efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
2022-09-12 15:29:13 - Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe - *Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Enze Xie, Zhiqi Li, Hanming Deng, Hao Tian, Xizhou Zhu, Li Chen, Yulu Gao, Xiangwei Geng, Jia Zeng, Yang Li, Jiazhi Yang, Xiaosong Jia, Bohan Yu, Yu Qiao, Dahua Lin, Si Liu, Junchi Yan, Jianping Shi, Ping Luo* - `2209.05324v1` - [abs](http://arxiv.org/abs/2209.05324v1) - [pdf](http://arxiv.org/pdf/2209.05324v1) > Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. BEV perception inherits several advantages, as representing surrounding scenes in BEV is intuitive and fusion-friendly; and representing objects in BEV is most desirable for subsequent modules as in planning and/or control. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; (c) how to formulate the pipeline to incorporate features from different sources and views; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios. In this survey, we review the most recent work on BEV perception and provide an in-depth analysis of different solutions. Moreover, several systematic designs of BEV approach from the industry are depicted as well. Furthermore, we introduce a full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs. At last, we point out the future research directions in this area. We hope this report would shed some light on the community and encourage more research effort on BEV perception. We keep an active repository to collect the most recent work and provide a toolbox for bag of tricks at https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe.
2022-09-12 16:06:10 - Analysis and Comparison of Classification Metrics - *Luciana Ferrer* - `2209.05355v1` - [abs](http://arxiv.org/abs/2209.05355v1) - [pdf](http://arxiv.org/pdf/2209.05355v1) > A number of different performance metrics are commonly used in the machine learning literature for classification systems that output categorical decisions. Some of the most common ones are accuracy, total error (one minus accuracy), balanced accuracy, balanced total error (one minus balanced accuracy), F-score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that the empirical estimate of the EC is a generalized version of both the total error and balanced total error. Further, we show its relation with F-score and MCC and argue that EC is superior to them, being more general, simpler, intuitive and well motivated. We highlight some issues with the F-score and the MCC that make them suboptimal metrics. While not explained in the current version of this manuscript, where we focus exclusively on metrics that are computed over hard decisions, the EC has the additional advantage of being a great tool to measure calibration of a system's scores and allows users to make optimal decisions given a set of posteriors for each class. We leave that discussion for a future version of this manuscript.
2022-09-12 17:00:31 - On Faithfulness and Coherence of Language Explanations for Recommendation Systems - *Zhouhang Xie, Julian McAuley, Bodhisattwa Prasad Majumder* - `2209.05409v1` - [abs](http://arxiv.org/abs/2209.05409v1) - [pdf](http://arxiv.org/pdf/2209.05409v1) > Reviews contain rich information about product characteristics and user interests and thus are commonly used to boost recommender system performance. Specifically, previous work show that jointly learning to perform review generation improves rating prediction performance. Meanwhile, these model-produced reviews serve as recommendation explanations, providing the user with insights on predicted ratings. However, while existing models could generate fluent, human-like reviews, it is unclear to what degree the reviews fully uncover the rationale behind the jointly predicted rating. In this work, we perform a series of evaluations that probes state-of-the-art models and their review generation component. We show that the generated explanations are brittle and need further evaluation before being taken as literal rationales for the estimated ratings.
2022-09-12 20:02:12 - Intrusion Detection Systems Using Support Vector Machines on the KDDCUP'99 and NSL-KDD Datasets: A Comprehensive Survey - *Mikel K. Ngueajio, Gloria Washington, Danda B. Rawat, Yolande Ngueabou* - `2209.05579v1` - [abs](http://arxiv.org/abs/2209.05579v1) - [pdf](http://arxiv.org/pdf/2209.05579v1) > With the growing rates of cyber-attacks and cyber espionage, the need for better and more powerful intrusion detection systems (IDS) is even more warranted nowadays. The basic task of an IDS is to act as the first line of defense, in detecting attacks on the internet. As intrusion tactics from intruders become more sophisticated and difficult to detect, researchers have started to apply novel Machine Learning (ML) techniques to effectively detect intruders and hence preserve internet users' information and overall trust in the entire internet network security. Over the last decade, there has been an explosion of research on intrusion detection techniques based on ML and Deep Learning (DL) architectures on various cyber security-based datasets such as the DARPA, KDDCUP'99, NSL-KDD, CAIDA, CTU-13, UNSW-NB15. In this research, we review contemporary literature and provide a comprehensive survey of different types of intrusion detection technique that applies Support Vector Machines (SVMs) algorithms as a classifier. We focus only on studies that have been evaluated on the two most widely used datasets in cybersecurity namely: the KDDCUP'99 and the NSL-KDD datasets. We provide a summary of each method, identifying the role of the SVMs classifier, and all other algorithms involved in the studies. Furthermore, we present a critical review of each method, in tabular form, highlighting the performance measures, strengths, and limitations of each of the methods surveyed.
2022-09-13 02:57:05 - Vision Transformers for Action Recognition: A Survey - *Anwaar Ulhaq, Naveed Akhtar, Ganna Pogrebna, Ajmal Mian* - `2209.05700v1` - [abs](http://arxiv.org/abs/2209.05700v1) - [pdf](http://arxiv.org/pdf/2209.05700v1) > Vision transformers are emerging as a powerful tool to solve computer vision problems. Recent techniques have also proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks. Among those, human action recognition is receiving special attention from the research community due to its widespread applications. This article provides the first comprehensive survey of vision transformer techniques for action recognition. We analyze and summarize the existing and emerging literature in this direction while highlighting the popular trends in adapting transformers for action recognition. Due to their specialized application, we collectively refer to these methods as ``action transformers''. Our literature review provides suitable taxonomies for action transformers based on their architecture, modality, and intended objective. Within the context of action transformers, we explore the techniques to encode spatio-temporal data, dimensionality reduction, frame patch and spatio-temporal cube construction, and various representation methods. We also investigate the optimization of spatio-temporal attention in transformer layers to handle longer sequences, typically by reducing the number of tokens in a single attention operation. Moreover, we also investigate different network learning strategies, such as self-supervised and zero-shot learning, along with their associated losses for transformer-based action recognition. This survey also summarizes the progress towards gaining grounds on evaluation metric scores on important benchmarks with action transformers. Finally, it provides a discussion on the challenges, outlook, and future avenues for this research direction.
2022-09-13 04:30:40 - A Guide to Employ Hyperspectral Imaging for Assessing Wheat Quality at Different Stages of Supply Chain in Australia: A Review - *Priyabrata Karmakar, Shyh Wei Teng. Manzur Murshed, Paul Pang, Cuong Van Bui* - `2209.05727v1` - [abs](http://arxiv.org/abs/2209.05727v1) - [pdf](http://arxiv.org/pdf/2209.05727v1) > Wheat is one of the major staple crops across the globe. Therefore, it is mandatory to measure, maintain and improve the wheat quality for human consumption. Traditional wheat quality measurement methods are mostly invasive, destructive and limited to small samples of wheat. In a typical supply chain of wheat, there are many receival points where bulk wheat arrives, gets stored and forwarded as per the requirements. In this receival points, the application of traditional quality measurement methods is difficult and often very expensive. Therefore, there is a need for non-invasive, non-destructive real-time methods for wheat quality assessments. One such method that fulfils the above-mentioned criteria is hyperspectral imaging (HSI) for food quality measurement and it can also be applied to bulk samples. In this paper, we have investigated how HSI has been used in the literature for assessing stored wheat quality. So that the required information to implement real-time digital quality assessment methods at the different stages of Australian supply chain can be made available in a single and compact document.
2022-09-13 12:43:41 - Learning to Prevent Profitless Neural Code Completion - *Zhensu Sun, Xiaoning Du, Fu Song, Shangwen Wang, Mingze Ni, Li Li* - `2209.05948v1` - [abs](http://arxiv.org/abs/2209.05948v1) - [pdf](http://arxiv.org/pdf/2209.05948v1) > Currently, large pre-trained models are widely applied in neural code completion systems, such as Github Copilot, aiXcoder, and TabNine. Though large models significantly outperform their smaller counterparts, a survey with 2,631 participants reveals that around 70\% displayed code completions from Copilot are not accepted by developers. Being reviewed but not accepted, these completions bring a threat to productivity. Besides, considering the high cost of the large models, it is a huge waste of computing resources and energy, which severely goes against the sustainable development principle of AI technologies. Additionally, in code completion systems, the completion requests are automatically and actively issued to the models as developers type out, which significantly aggravates the workload. However, to the best of our knowledge, such waste has never been realized, not to mention effectively addressed, in the context of neural code completion. Hence, preventing such profitless code completions from happening in a cost-friendly way is of urgent need. To fill this gap, we first investigate the prompts of these completions and find four observable prompt patterns, which demonstrate the feasibility of identifying such prompts based on prompts themselves. Motivated by this finding, we propose an early-rejection mechanism to turn down low-return prompts by foretelling the completion qualities without sending them to the LCM. Further, we propose a lightweight Transformer-based estimator to demonstrate the feasibility of the mechanism. The experimental results show that the estimator rejects low-return prompts with a promising accuracy of 83.2%.
2022-09-13 19:26:36 - Learning affective meanings that derives the social behavior using Bidirectional Encoder Representations from Transformers - *Moeen Mostafavi, Michael D. Porter, Dawn T. Robinson* - `2202.00065v2` - [abs](http://arxiv.org/abs/2202.00065v2) - [pdf](http://arxiv.org/pdf/2202.00065v2) > Predicting the outcome of a process requires modeling the system dynamic and observing the states. In the context of social behaviors, sentiments characterize the states of the system. Affect Control Theory (ACT) uses sentiments to manifest potential interaction. ACT is a generative theory of culture and behavior based on a three-dimensional sentiment lexicon. Traditionally, the sentiments are quantified using survey data which is fed into a regression model to explain social behavior. The lexicons used in the survey are limited due to prohibitive cost. This paper uses a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model to develop a replacement for these surveys. This model achieves state-of-the-art accuracy in estimating affective meanings, expanding the affective lexicon, and allowing more behaviors to be explained.
2022-09-14 01:52:17 - A Review and Roadmap of Deep Learning Causal Discovery in Different Variable Paradigms - *Hang Chen, Keqing Du, Xinyu Yang, Chenguang Li* - `2209.06367v1` - [abs](http://arxiv.org/abs/2209.06367v1) - [pdf](http://arxiv.org/pdf/2209.06367v1) > Understanding causality helps to structure interventions to achieve specific goals and enables predictions under interventions. With the growing importance of learning causal relationships, causal discovery tasks have transitioned from using traditional methods to infer potential causal structures from observational data to the field of pattern recognition involved in deep learning. The rapid accumulation of massive data promotes the emergence of causal search methods with brilliant scalability. Existing summaries of causal discovery methods mainly focus on traditional methods based on constraints, scores and FCMs, there is a lack of perfect sorting and elaboration for deep learning-based methods, also lacking some considers and exploration of causal discovery methods from the perspective of variable paradigms. Therefore, we divide the possible causal discovery tasks into three types according to the variable paradigm and give the definitions of the three tasks respectively, define and instantiate the relevant datasets for each task and the final causal model constructed at the same time, then reviews the main existing causal discovery methods for different tasks. Finally, we propose some roadmaps from different perspectives for the current research gaps in the field of causal discovery and point out future research directions.
2022-09-14 01:55:15 - Data-Driven Machine Learning Models for a Multi-Objective Flapping Fin Unmanned Underwater Vehicle Control System - *Julian Lee, Kamal Viswanath, Jason Geder, Alisha Sharma, Marius Pruessner, Brian Zhou* - `2209.06369v1` - [abs](http://arxiv.org/abs/2209.06369v1) - [pdf](http://arxiv.org/pdf/2209.06369v1) > Flapping-fin unmanned underwater vehicle (UUV) propulsion systems provide high maneuverability for naval tasks such as surveillance and terrain exploration. Recent work has explored the use of time-series neural network surrogate models to predict thrust from vehicle design and fin kinematics. We develop a search-based inverse model that leverages a kinematics-to-thrust neural network model for control system design. Our inverse model finds a set of fin kinematics with the multi-objective goal of reaching a target thrust and creating a smooth kinematic transition between flapping cycles. We demonstrate how a control system integrating this inverse model can make online, cycle-to-cycle adjustments to prioritize different system objectives.
2022-09-14 02:37:06 - Self-Supervised Clustering on Image-Subtracted Data with Deep-Embedded Self-Organizing Map - *Y. -L. Mong, K. Ackley, T. L. Killestein, D. K. Galloway, M. Dyer, R. Cutter, M. J. I. Brown, J. Lyman, K. Ulaczyk, D. Steeghs, V. Dhillon, P. O'Brien, G. Ramsay, K. Noysena, R. Kotak, R. Breton, L. Nuttall, E. Palle, D. Pollacco, E. Thrane, S. Awiphan, U. Burhanudin, P. Chote, A. Chrimes, E. Daw, C. Duffy, R. Eyles-Ferris, B. P. Gompertz, T. Heikkila, P. Irawati, M. Kennedy, A. Levan, S. Littlefair, L. Makrygianni, T. Marsh, D. Mata Sanchez, S. Mattila, J. R. Maund, J. McCormac, D. Mkrtichian, J. Mullaney, E. Rol, U. Sawangwit, E. Stanway, R. Starling, P. Strom, S. Tooke, K. Wiersema* - `2209.06375v1` - [abs](http://arxiv.org/abs/2209.06375v1) - [pdf](http://arxiv.org/pdf/2209.06375v1) > Developing an effective automatic classifier to separate genuine sources from artifacts is essential for transient follow-ups in wide-field optical surveys. The identification of transient detections from the subtraction artifacts after the image differencing process is a key step in such classifiers, known as real-bogus classification problem. We apply a self-supervised machine learning model, the deep-embedded self-organizing map (DESOM) to this "real-bogus" classification problem. DESOM combines an autoencoder and a self-organizing map to perform clustering in order to distinguish between real and bogus detections, based on their dimensionality-reduced representations. We use 32x32 normalized detection thumbnails as the input of DESOM. We demonstrate different model training approaches, and find that our best DESOM classifier shows a missed detection rate of 6.6% with a false positive rate of 1.5%. DESOM offers a more nuanced way to fine-tune the decision boundary identifying likely real detections when used in combination with other types of classifiers, for example built on neural networks or decision trees. We also discuss other potential usages of DESOM and its limitations.
2022-09-14 03:35:25 - A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends - *Ying Bi, Bing Xue, Pablo Mesejo, Stefano Cagnoni, Mengjie Zhang* - `2209.06399v1` - [abs](http://arxiv.org/abs/2209.06399v1) - [pdf](http://arxiv.org/pdf/2209.06399v1) > Computer vision (CV) is a big and important field in artificial intelligence covering a wide range of applications. Image analysis is a major task in CV aiming to extract, analyse and understand the visual content of images. However, image-related tasks are very challenging due to many factors, e.g., high variations across images, high dimensionality, domain expertise requirement, and image distortions. Evolutionary computation (EC) approaches have been widely used for image analysis with significant achievement. However, there is no comprehensive survey of existing EC approaches to image analysis. To fill this gap, this paper provides a comprehensive survey covering all essential EC approaches to important image analysis tasks including edge detection, image segmentation, image feature analysis, image classification, object detection, and others. This survey aims to provide a better understanding of evolutionary computer vision (ECV) by discussing the contributions of different approaches and exploring how and why EC is used for CV and image analysis. The applications, challenges, issues, and trends associated to this research field are also discussed and summarised to provide further guidelines and opportunities for future research.
2022-09-14 05:45:26 - Semantic Visual Simultaneous Localization and Mapping: A Survey - *Kaiqi Chen, Jianhua Zhang, Jialing Liu, Qiyi Tong, Ruyu Liu, Shengyong Chen* - `2209.06428v1` - [abs](http://arxiv.org/abs/2209.06428v1) - [pdf](http://arxiv.org/pdf/2209.06428v1) > Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.
2022-09-14 08:51:02 - Revisiting Crowd Counting: State-of-the-art, Trends, and Future Perspectives - *Muhammad Asif Khan, Hamid Menouar, Ridha Hamila* - `2209.07271v1` - [abs](http://arxiv.org/abs/2209.07271v1) - [pdf](http://arxiv.org/pdf/2209.07271v1) > Crowd counting is an effective tool for situational awareness in public places. Automated crowd counting using images and videos is an interesting yet challenging problem that has gained significant attention in computer vision. Over the past few years, various deep learning methods have been developed to achieve state-of-the-art performance. The methods evolved over time vary in many aspects such as model architecture, input pipeline, learning paradigm, computational complexity, and accuracy gains etc. In this paper, we present a systematic and comprehensive review of the most significant contributions in the area of crowd counting. Although few surveys exist on the topic, our survey is most up-to date and different in several aspects. First, it provides a more meaningful categorization of the most significant contributions by model architectures, learning methods (i.e., loss functions), and evaluation methods (i.e., evaluation metrics). We chose prominent and distinct works and excluded similar works. We also sort the well-known crowd counting models by their performance over benchmark datasets. We believe that this survey can be a good resource for novice researchers to understand the progressive developments and contributions over time and the current state-of-the-art.
2022-09-14 09:30:11 - Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) - *Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, Yongfeng Zhang* - `2203.13366v5` - [abs](http://arxiv.org/abs/2203.13366v5) - [pdf](http://arxiv.org/pdf/2203.13366v5) > For a long time, different recommendation tasks typically require designing task-specific architectures and training objectives. As a result, it is hard to transfer the learned knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches, e.g., a sequential recommendation model can hardly be applied or transferred to a review generation method. To deal with such issues, considering that language can describe almost anything and language grounding is a powerful medium to represent various problems or tasks, we present a flexible and unified text-to-text paradigm called "Pretrain, Personalized Prompt, and Predict Paradigm" (P5) for recommendation, which unifies various recommendation tasks in a shared framework. In P5, all data such as user-item interactions, user descriptions, item metadata, and user reviews are converted to a common format -- natural language sequences. The rich information from natural language assists P5 to capture deeper semantics for personalization and recommendation. Specifically, P5 learns different tasks with the same language modeling objective during pretraining. Thus, it serves as the foundation model for various downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation based on prompts. P5 advances recommender systems from shallow model to deep model to big model, and will revolutionize the technical form of recommender systems towards universal recommendation engine. With adaptive personalized prompt for different users, P5 is able to make predictions in a zero-shot or few-shot manner and largely reduces the necessity for extensive fine-tuning. On several recommendation benchmarks, we conduct experiments to show the effectiveness of P5. We release the source code at https://github.com/jeykigung/P5.
2022-09-14 09:49:19 - Preregistered protocol for: Articulatory changes in speech following treatment for oral or oropharyngeal cancer: a systematic review - *Thomas B. Tienkamp, Teja Rebernik, Defne Abur, Rob J. J. H. van Son, Sebastiaan A. H. J. de Visscher, Max J. H. Witjes, Martijn Wieling* - `2209.06521v1` - [abs](http://arxiv.org/abs/2209.06521v1) - [pdf](http://arxiv.org/pdf/2209.06521v1) > This document outlines a PROSPERO pre-registered protocol for a systematic review regarding articulatory changes in speech following oral or orophayrngeal cancer treatment. Treatment of tumours in the oral cavity may result in physiological changes that could lead to articulatory difficulties. The tongue becomes less mobile due to scar tissue and/or potential (postoperative) radiation therapy. Moreover, tissue loss may create a bypass for airflow or limit constriction possibilities. In order to gain a better understanding of the nature of the speech problems, information regarding the movement of the articulators is needed since perceptual or acoustic information provide only indirect evidence of articulatory changes. Therefore, this systematic review will review studies that directly measured the articulatory movements of the tongue, jaw, and lips following treatment for oral or oropharyngeal cancer.
2022-09-14 10:01:29 - Explainable AI for clinical and remote health applications: a survey on tabular and time series data - *Flavio Di Martino, Franca Delmastro* - `2209.06528v1` - [abs](http://arxiv.org/abs/2209.06528v1) - [pdf](http://arxiv.org/pdf/2209.06528v1) > Nowadays Artificial Intelligence (AI) has become a fundamental component of healthcare applications, both clinical and remote, but the best performing AI systems are often too complex to be self-explaining. Explainable AI (XAI) techniques are defined to unveil the reasoning behind the system's predictions and decisions, and they become even more critical when dealing with sensitive and personal health data. It is worth noting that XAI has not gathered the same attention across different research areas and data types, especially in healthcare. In particular, many clinical and remote health applications are based on tabular and time series data, respectively, and XAI is not commonly analysed on these data types, while computer vision and Natural Language Processing (NLP) are the reference applications. To provide an overview of XAI methods that are most suitable for tabular and time series data in the healthcare domain, this paper provides a review of the literature in the last 5 years, illustrating the type of generated explanations and the efforts provided to evaluate their relevance and quality. Specifically, we identify clinical validation, consistency assessment, objective and standardised quality evaluation, and human-centered quality assessment as key features to ensure effective explanations for the end users. Finally, we highlight the main research challenges in the field as well as the limitations of existing XAI methods.
2022-09-14 12:48:52 - Learned reconstruction methods with convergence guarantees - *Subhadip Mukherjee, Andreas Hauptmann, Ozan Öktem, Marcelo Pereyra, Carola-Bibiane Schönlieb* - `2206.05431v3` - [abs](http://arxiv.org/abs/2206.05431v3) - [pdf](http://arxiv.org/pdf/2206.05431v3) > In recent years, deep learning has achieved remarkable empirical success for image reconstruction. This has catalyzed an ongoing quest for precise characterization of correctness and reliability of data-driven methods in critical use-cases, for instance in medical imaging. Notwithstanding the excellent performance and efficacy of deep learning-based methods, concerns have been raised regarding their stability, or lack thereof, with serious practical implications. Significant advances have been made in recent years to unravel the inner workings of data-driven image recovery methods, challenging their widely perceived black-box nature. In this article, we will specify relevant notions of convergence for data-driven image reconstruction, which will form the basis of a survey of learned methods with mathematically rigorous reconstruction guarantees. An example that is highlighted is the role of ICNN, offering the possibility to combine the power of deep learning with classical convex regularization theory for devising methods that are provably convergent. This survey article is aimed at both methodological researchers seeking to advance the frontiers of our understanding of data-driven image reconstruction methods as well as practitioners, by providing an accessible description of useful convergence concepts and by placing some of the existing empirical practices on a solid mathematical foundation.
2022-09-14 18:32:26 - On the State of the Art in Authorship Attribution and Authorship Verification - *Jacob Tyo, Bhuwan Dhingra, Zachary C. Lipton* - `2209.06869v1` - [abs](http://arxiv.org/abs/2209.06869v1) - [pdf](http://arxiv.org/pdf/2209.06869v1) > Despite decades of research on authorship attribution (AA) and authorship verification (AV), inconsistent dataset splits/filtering and mismatched evaluation methods make it difficult to assess the state of the art. In this paper, we present a survey of the fields, resolve points of confusion, introduce Valla that standardizes and benchmarks AA/AV datasets and metrics, provide a large-scale empirical evaluation, and provide apples-to-apples comparisons between existing methods. We evaluate eight promising methods on fifteen datasets (including distribution-shifted challenge sets) and introduce a new large-scale dataset based on texts archived by Project Gutenberg. Surprisingly, we find that a traditional Ngram-based model performs best on 5 (of 7) AA tasks, achieving an average macro-accuracy of $76.50\%$ (compared to $66.71\%$ for a BERT-based model). However, on the two AA datasets with the greatest number of words per author, as well as on the AV datasets, BERT-based models perform best. While AV methods are easily applied to AA, they are seldom included as baselines in AA papers. We show that through the application of hard-negative mining, AV methods are competitive alternatives to AA methods. Valla and all experiment code can be found here: https://github.com/JacobTyo/Valla
2022-09-14 19:53:32 - Out of One, Many: Using Language Models to Simulate Human Samples - *Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua Gubler, Christopher Rytting, David Wingate* - `2209.06899v1` - [abs](http://arxiv.org/abs/2209.06899v1) - [pdf](http://arxiv.org/pdf/2209.06899v1) > We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the "algorithmic bias" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property "algorithmic fidelity" and explore its extent in GPT-3. We create "silicon samples" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.
2022-09-14 20:43:59 - The need for a more human-centered approach to designing and validating transparent AI in medical image analysis -- Guidelines and Evidence from a Systematic Review - *Haomin Chen, Catalina Gomez, Chien-Ming Huang, Mathias Unberath* - `2112.12596v3` - [abs](http://arxiv.org/abs/2112.12596v3) - [pdf](http://arxiv.org/pdf/2112.12596v3) > Transparency in Machine Learning (ML), attempts to reveal the working mechanisms of complex models. Transparent ML promises to advance human factors engineering goals of human-centered AI in the target users. From a human-centered design perspective, transparency is not a property of the ML model but an affordance, i.e. a relationship between algorithm and user; as a result, iterative prototyping and evaluation with users is critical to attaining adequate solutions that afford transparency. However, following human-centered design principles in healthcare and medical image analysis is challenging due to the limited availability of and access to end users. To investigate the state of transparent ML in medical image analysis, we conducted a systematic review of the literature. Our review reveals multiple severe shortcomings in the design and validation of transparent ML for medical image analysis applications. We find that most studies to date approach transparency as a property of the model itself, similar to task performance, without considering end users during neither development nor evaluation. Additionally, the lack of user research, and the sporadic validation of transparency claims put contemporary research on transparent ML for medical image analysis at risk of being incomprehensible to users, and thus, clinically irrelevant. To alleviate these shortcomings in forthcoming research while acknowledging the challenges of human-centered design in healthcare, we introduce the INTRPRT guideline, a systematic design directive for transparent ML systems in medical image analysis. The INTRPRT guideline suggests formative user research as the first step of transparent model design to understand user needs and domain requirements. Following this process produces evidence to support design choices, and ultimately, increases the likelihood that the algorithms afford transparency.
2022-09-14 23:47:32 - SQL and NoSQL Databases Software architectures performance analysis and assessments -- A Systematic Literature review - *Wisal Khan, Teerath Kumar, Zhang Cheng, Kislay Raj, Arunabha M Roy, Bin Luo* - `2209.06977v1` - [abs](http://arxiv.org/abs/2209.06977v1) - [pdf](http://arxiv.org/pdf/2209.06977v1) > Context: The efficient processing of Big Data is a challenging task for SQL and NoSQL Databases, where competent software architecture plays a vital role. The SQL Databases are designed for structuring data and supporting vertical scalability. In contrast, horizontal scalability is backed by NoSQL Databases and can process sizeable unstructured Data efficiently. One can choose the right paradigm according to the organisation's needs; however, making the correct choice can often be challenging. The SQL and NoSQL Databases follow different architectures. Also, the mixed model is followed by each category of NoSQL Databases. Hence, data movement becomes difficult for cloud consumers across multiple cloud service providers (CSPs). In addition, each cloud platform IaaS, PaaS, SaaS, and DBaaS also monitors various paradigms. Objective: This systematic literature review (SLR) aims to study the related articles associated with SQL and NoSQL Database software architectures and tackle data portability and Interoperability among various cloud platforms. State of the art presented many performance comparison studies of SQL and NoSQL Databases by observing scaling, performance, availability, consistency and sharding characteristics. According to the research studies, NoSQL Database designed structures can be the right choice for big data analytics, while SQL Databases are suitable for OLTP Databases. The researcher proposes numerous approaches associated with data movement in the cloud. Platform-based APIs are developed, which makes users' data movement difficult. Therefore, data portability and Interoperability issues are noticed during data movement across multiple CSPs. To minimize developer efforts and Interoperability, Unified APIs are demanded to make data movement relatively more accessible among various cloud platforms.
2022-09-15 01:20:23 - Task Oriented Video Coding: A Survey - *Daniel Wood* - `2208.07313v2` - [abs](http://arxiv.org/abs/2208.07313v2) - [pdf](http://arxiv.org/pdf/2208.07313v2) > Video coding technology has been continuously improved for higher compression ratio with higher resolution. However, the state-of-the-art video coding standards, such as H.265/HEVC and Versatile Video Coding, are still designed with the assumption the compressed video will be watched by humans. With the tremendous advance and maturation of deep neural networks in solving computer vision tasks, more and more videos are directly analyzed by deep neural networks without humans' involvement. Such a conventional design for video coding standard is not optimal when the compressed video is used by computer vision applications. While the human visual system is consistently sensitive to the content with high contrast, the impact of pixels on computer vision algorithms is driven by specific computer vision tasks. In this paper, we explore and summarize recent progress on computer vision task oriented video coding and emerging video coding standard, Video Coding for Machines.
2022-09-15 02:03:24 - Responsible AI Pattern Catalogue: A Multivocal Literature Review - *Qinghua Lu, Liming Zhu, Xiwei Xu, Jon Whittle, Didar Zowghi, Aurelie Jacquet* - `2209.04963v3` - [abs](http://arxiv.org/abs/2209.04963v3) - [pdf](http://arxiv.org/pdf/2209.04963v3) > Responsible AI has been widely considered as one of the greatest scientific challenges of our time and the key to increase the adoption of AI. A number of AI ethics principles frameworks have been published recently. However, without further best practice guidance, practitioners are left with nothing much beyond truisms. Also, significant efforts have been placed at algorithm-level rather than system-level, mainly focusing on a subset of mathematics-amenable ethical principles (such as fairness). Nevertheless, ethical issues can occur at any step of the development lifecycle crosscutting many AI and non-AI components of systems beyond AI algorithms and models. To operationalize responsible AI from a system perspective, in this paper, we present a Responsible AI Pattern Catalogue based on the results of a Multivocal Literature Review (MLR). Rather than staying at the principle or algorithm level, we focus on patterns that AI system stakeholders can undertake in practice to ensure that the developed AI systems are responsible throughout the entire governance and engineering lifecycle. The Responsible AI Pattern Catalogue classifies the patterns into three groups: multi-level governance patterns, trustworthy process patterns, and responsible-AI-by-design product patterns. These patterns provide a systematic and actionable guidance for stakeholders to implement responsible AI.
2022-09-15 02:06:25 - VIPHY: Probing "Visible" Physical Commonsense Knowledge - *Shikhar Singh, Ehsan Qasemi, Muhao Chen* - `2209.07000v1` - [abs](http://arxiv.org/abs/2209.07000v1) - [pdf](http://arxiv.org/pdf/2209.07000v1) > In recent years, vision-language models (VLMs) have shown remarkable performance on visual reasoning tasks (e.g. attributes, location). While such tasks measure the requisite knowledge to ground and reason over a given visual instance, they do not, however, measure the ability of VLMs to retain and generalize such knowledge. In this work, we evaluate their ability to acquire "visible" physical knowledge -- the information that is easily accessible from images of static scenes, particularly across the dimensions of object color, size and space. We build an automatic pipeline to derive a comprehensive knowledge resource for calibrating and probing these models. Our results indicate a severe gap between model and human performance across all three tasks. Furthermore, our caption pretrained baseline (CapBERT) significantly outperforms VLMs on both size and spatial tasks -- highlighting that despite sufficient access to ground language with visual modality, they struggle to retain such knowledge. The dataset and code are available at https://github.com/Axe--/ViPhy .
2022-09-15 03:43:06 - Diffusion Models: A Comprehensive Survey of Methods and Applications - *Ling Yang, Zhilong Zhang, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Ming-Hsuan Yang, Bin Cui* - `2209.00796v6` - [abs](http://arxiv.org/abs/2209.00796v6) - [pdf](http://arxiv.org/pdf/2209.00796v6) > Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy.
2022-09-15 05:13:25 - MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results - *Ruicheng Feng, Chongyi Li, Shangchen Zhou, Wenxiu Sun, Qingpeng Zhu, Jun Jiang, Qingyu Yang, Chen Change Loy, Jinwei Gu* - `2209.07052v1` - [abs](http://arxiv.org/abs/2209.07052v1) - [pdf](http://arxiv.org/pdf/2209.07052v1) > Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). To bridge the gap, we introduce the first MIPI challenge including five tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Under-Display Camera (UDC) Image Restoration track on MIPI 2022. In total, 167 participants were successfully registered, and 19 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Under-Display Camera Image Restoration. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://github.com/mipi-challenge/MIPI2022.
2022-09-15 05:52:29 - CommunityLM: Probing Partisan Worldviews from Language Models - *Hang Jiang, Doug Beeferman, Brandon Roy, Deb Roy* - `2209.07065v1` - [abs](http://arxiv.org/abs/2209.07065v1) - [pdf](http://arxiv.org/pdf/2209.07065v1) > As political attitudes have diverged ideologically in the United States, political speech has diverged lingusitically. The ever-widening polarization between the US political parties is accelerated by an erosion of mutual understanding between them. We aim to make these communities more comprehensible to each other with a framework that probes community-specific responses to the same survey questions using community language models CommunityLM. In our framework we identify committed partisan members for each community on Twitter and fine-tune LMs on the tweets authored by them. We then assess the worldviews of the two groups using prompt-based probing of their corresponding LMs, with prompts that elicit opinions about public figures and groups surveyed by the American National Election Studies (ANES) 2020 Exploratory Testing Survey. We compare the responses generated by the LMs to the ANES survey results, and find a level of alignment that greatly exceeds several baseline methods. Our work aims to show that we can use community LMs to query the worldview of any group of people given a sufficiently large sample of their social media discussions or media diet.
2022-09-15 06:24:01 - Responsible AI Implementation: A Human-centered Framework for Accelerating the Innovation Process - *Dian Tjondronegoro, Elizabeth Yuwono, Brent Richards, Damian Green, Siiri Hatakka* - `2209.07076v1` - [abs](http://arxiv.org/abs/2209.07076v1) - [pdf](http://arxiv.org/pdf/2209.07076v1) > There is still a significant gap between expectations and the successful adoption of AI to innovate and improve businesses. Due to the emergence of deep learning, AI adoption is more complex as it often incorporates big data and the internet of things, affecting data privacy. Existing frameworks have identified the need to focus on human-centered design, combining technical and business/organizational perspectives. However, trust remains a critical issue that needs to be designed from the beginning. The proposed framework expands from the human-centered design approach, emphasizing and maintaining the trust that underpins the process. This paper proposes a theoretical framework for responsible artificial intelligence (AI) implementation. The proposed framework emphasizes a synergistic business technology approach for the agile co-creation process. The aim is to streamline the adoption process of AI to innovate and improve business by involving all stakeholders throughout the project so that the AI technology is designed, developed, and deployed in conjunction with people and not in isolation. The framework presents a fresh viewpoint on responsible AI implementation based on analytical literature review, conceptual framework design, and practitioners' mediating expertise. The framework emphasizes establishing and maintaining trust throughout the human-centered design and agile development of AI. This human-centered approach is aligned with and enabled by the privacy by design principle. The creators of the technology and the end-users are working together to tailor the AI solution specifically for the business requirements and human characteristics. An illustrative case study on adopting AI for assisting planning in a hospital will demonstrate that the proposed framework applies to real-life applications.
2022-09-15 07:49:23 - $ρ$-GNF : A Novel Sensitivity Analysis Approach Under Unobserved Confounders - *Sourabh Balgi, Jose M. Peña, Adel Daoud* - `2209.07111v1` - [abs](http://arxiv.org/abs/2209.07111v1) - [pdf](http://arxiv.org/pdf/2209.07111v1) > We propose a new sensitivity analysis model that combines copulas and normalizing flows for causal inference under unobserved confounding. We refer to the new model as $\rho$-GNF ($\rho$-Graphical Normalizing Flow), where $\rho{\in}[-1,+1]$ is a bounded sensitivity parameter representing the backdoor non-causal association due to unobserved confounding modeled using the most well studied and widely popular Gaussian copula. Specifically, $\rho$-GNF enables us to estimate and analyse the frontdoor causal effect or average causal effect (ACE) as a function of $\rho$. We call this the $\rho_{curve}$. The $\rho_{curve}$ enables us to specify the confounding strength required to nullify the ACE. We call this the $\rho_{value}$. Further, the $\rho_{curve}$ also enables us to provide bounds for the ACE given an interval of $\rho$ values. We illustrate the benefits of $\rho$-GNF with experiments on simulated and real-world data in terms of our empirical ACE bounds being narrower than other popular ACE bounds.
2022-09-15 08:37:12 - The Impact of Edge Displacement Vaserstein Distance on UD Parsing Performance - *Mark Anderson, Carlos Gómez-Rodríguez* - `2209.07139v1` - [abs](http://arxiv.org/abs/2209.07139v1) - [pdf](http://arxiv.org/pdf/2209.07139v1) > We contribute to the discussion on parsing performance in NLP by introducing a measurement that evaluates the differences between the distributions of edge displacement (the directed distance of edges) seen in training and test data. We hypothesize that this measurement will be related to differences observed in parsing performance across treebanks. We motivate this by building upon previous work and then attempt to falsify this hypothesis by using a number of statistical methods. We establish that there is a statistical correlation between this measurement and parsing performance even when controlling for potential covariants. We then use this to establish a sampling technique that gives us an adversarial and complementary split. This gives an idea of the lower and upper bounds of parsing systems for a given treebank in lieu of freshly sampled data. In a broader sense, the methodology presented here can act as a reference for future correlation-based exploratory work in NLP.
2022-09-15 09:49:17 - Literature Review of various Fuzzy Rule based Systems - *Ayush K. Varshney, Vicenç Torra* - `2209.07175v1` - [abs](http://arxiv.org/abs/2209.07175v1) - [pdf](http://arxiv.org/pdf/2209.07175v1) > Fuzzy rule based systems (FRBSs) is a rule-based system which uses linguistic fuzzy variables as antecedents and consequent to represent the human understandable knowledge. They have been applied to various applications and areas throughout the literature. However, FRBSs suffers from many drawbacks such as uncertainty representation, high number of rules, interpretability loss, high computational time for learning etc. To overcome these issues with FRBSs, there exists many extentions of FRBSs. In this paper, we present an overview and literature review for various types and prominent areas of fuzzy systems (FRBSs) namely genetic fuzzy system (GFS), Hierarchical fuzzy system (HFS), neuro fuzzy system (NFS), evolving fuzzy system (eFS), FRBSs for big data, FRBSs for imbalanced data, interpretability in FRBSs and FRBSs which uses cluster centroids as fuzzy rule, during the years 2010-2021. GFS uses genetic/evolutionary approaches to improve the learning ability of FRBSs, HFS solve the curse of dimensionality for FRBSs, NFS improves approximation ability of FRBSs using neural networks and dynamic systems for streaming data is considered in eFS. FRBSs are seen as good solutions for big data and imbalanced data, in the recent years the interpretability in FRBSs has gained popularity due to high dimensional and big data and rules are initialized with cluster centroids to limit the number of rules in FRBSs. This paper also highlights important contributions, publication statistics and current trends in the field. The paper also addresses several open research areas which need further attention from the FRBSs research community.
2022-09-15 10:44:22 - Few-Shot Object Detection: A Comprehensive Survey - *Mona Köhler, Markus Eisenbach, Horst-Michael Gross* - `2112.11699v2` - [abs](http://arxiv.org/abs/2112.11699v2) - [pdf](http://arxiv.org/pdf/2112.11699v2) > Humans are able to learn to recognize new objects even from a few examples. In contrast, training deep-learning-based object detectors requires huge amounts of annotated data. To avoid the need to acquire and annotate these huge amounts of data, few-shot object detection aims to learn from few object instances of new categories in the target domain. In this survey, we provide an overview of the state of the art in few-shot object detection. We categorize approaches according to their training scheme and architectural layout. For each type of approaches, we describe the general realization as well as concepts to improve the performance on novel categories. Whenever appropriate, we give short takeaways regarding these concepts in order to highlight the best ideas. Eventually, we introduce commonly used datasets and their evaluation protocols and analyze reported benchmark results. As a result, we emphasize common challenges in evaluation and identify the most promising current trends in this emerging field of few-shot object detection.
2022-09-15 12:07:09 - Robust Implementation of Foreground Extraction and Vessel Segmentation for X-ray Coronary Angiography Image Sequence - *Zeyu Fu, Zhuang Fu, Chenzhuo Lv, Jun Yan* - `2209.07237v1` - [abs](http://arxiv.org/abs/2209.07237v1) - [pdf](http://arxiv.org/pdf/2209.07237v1) > The extraction of contrast-filled vessels from X-ray coronary angiography(XCA) image sequence has important clinical significance for intuitively diagnosis and therapy. In this study, XCA image sequence O is regarded as a three-dimensional tensor input, vessel layer H is a sparse tensor, and background layer B is a low-rank tensor. Using tensor nuclear norm(TNN) minimization, a novel method for vessel layer extraction based on tensor robust principal component analysis(TRPCA) is proposed. Furthermore, considering the irregular movement of vessels and the dynamic interference of surrounding irrelevant tissues, the total variation(TV) regularized spatial-temporal constraint is introduced to separate the dynamic background E. Subsequently, for the vessel images with uneven contrast distribution, a two-stage region growth(TSRG) method is utilized for vessel enhancement and segmentation. A global threshold segmentation is used as the pre-processing to obtain the main branch, and the Radon-Like features(RLF) filter is used to enhance and connect broken minor segments, the final vessel mask is constructed by combining the two intermediate results. We evaluated the visibility of TV-TRPCA algorithm for foreground extraction and the accuracy of TSRG algorithm for vessel segmentation on real clinical XCA image sequences and third-party database. Both qualitative and quantitative results verify the superiority of the proposed methods over the existing state-of-the-art approaches.
2022-09-15 15:29:57 - Deep Reinforcement Learning for Task Offloading in UAV-Aided Smart Farm Networks - *Anne Catherine Nguyen, Turgay Pamuklu, Aisha Syed, W. Sean Kennedy, Melike Erol-Kantarci* - `2209.07367v1` - [abs](http://arxiv.org/abs/2209.07367v1) - [pdf](http://arxiv.org/pdf/2209.07367v1) > The fifth and sixth generations of wireless communication networks are enabling tools such as internet of things devices, unmanned aerial vehicles (UAVs), and artificial intelligence, to improve the agricultural landscape using a network of devices to automatically monitor farmlands. Surveying a large area requires performing a lot of image classification tasks within a specific period of time in order to prevent damage to the farm in case of an incident, such as fire or flood. UAVs have limited energy and computing power, and may not be able to perform all of the intense image classification tasks locally and within an appropriate amount of time. Hence, it is assumed that the UAVs are able to partially offload their workload to nearby multi-access edge computing devices. The UAVs need a decision-making algorithm that will decide where the tasks will be performed, while also considering the time constraints and energy level of the other UAVs in the network. In this paper, we introduce a Deep Q-Learning (DQL) approach to solve this multi-objective problem. The proposed method is compared with Q-Learning and three heuristic baselines, and the simulation results show that our proposed DQL-based method achieves comparable results when it comes to the UAVs' remaining battery levels and percentage of deadline violations. In addition, our method is able to reach convergence 13 times faster than Q-Learning.
2022-09-15 16:13:19 - FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection - *Chaokang Jiang, Guangming Wang, Jinxing Wu, Yanzi Miao, Hesheng Wang* - `2209.07419v1` - [abs](http://arxiv.org/abs/2209.07419v1) - [pdf](http://arxiv.org/pdf/2209.07419v1) > Promising complementarity exists between the texture features of color images and the geometric information of LiDAR point clouds. However, there still present many challenges for efficient and robust feature fusion in the field of 3D object detection. In this paper, first, unstructured 3D point clouds are filled in the 2D plane and 3D point cloud features are extracted faster using projection-aware convolution layers. Further, the corresponding indexes between different sensor signals are established in advance in the data preprocessing, which enables faster cross-modal feature fusion. To address LiDAR points and image pixels misalignment problems, two new plug-and-play fusion modules, LiCamFuse and BiLiCamFuse, are proposed. In LiCamFuse, soft query weights with perceiving the Euclidean distance of bimodal features are proposed. In BiLiCamFuse, the fusion module with dual attention is proposed to deeply correlate the geometric and textural features of the scene. The quantitative results on the KITTI dataset demonstrate that the proposed method achieves better feature-level fusion. In addition, the proposed network shows a shorter running time compared to existing methods.
2022-09-15 17:37:08 - Unsupervised Opinion Summarization Using Approximate Geodesics - *Somnath Basu Roy Chowdhury, Nicholas Monath, Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi* - `2209.07496v1` - [abs](http://arxiv.org/abs/2209.07496v1) - [pdf](http://arxiv.org/pdf/2209.07496v1) > Opinion summarization is the task of creating summaries capturing popular opinions from user reviews. In this paper, we introduce Geodesic Summarizer (GeoSumm), a novel system to perform unsupervised extractive opinion summarization. GeoSumm involves an encoder-decoder based representation learning model, that generates representations of text as a distribution over latent semantic units. GeoSumm generates these representations by performing dictionary learning over pre-trained text representations at multiple decoder layers. We then use these representations to quantify the relevance of review sentences using a novel approximate geodesic distance based scoring mechanism. We use the relevance scores to identify popular opinions in order to compose general and aspect-specific summaries. Our proposed model, GeoSumm, achieves state-of-the-art performance on three opinion summarization datasets. We perform additional experiments to analyze the functioning of our model and showcase the generalization ability of {\X} across different domains.
2022-09-16 00:21:58 - Multiscale Adaptive Scheduling and Path-Planning for Power-Constrained UAV-Relays via SMDPs - *Bharath Keshavamurthy, Nicolo Michelusi* - `2209.07655v1` - [abs](http://arxiv.org/abs/2209.07655v1) - [pdf](http://arxiv.org/pdf/2209.07655v1) > We describe the orchestration of a decentralized swarm of rotary-wing UAV-relays, augmenting the coverage and service capabilities of a terrestrial base station. Our goal is to minimize the time-average service latencies involved in handling transmission requests from ground users under Poisson arrivals, subject to an average UAV power constraint. Equipped with rate adaptation to efficiently leverage air-to-ground channel stochastics, we first derive the optimal control policy for a single relay via a semi-Markov decision process formulation, with competitive swarm optimization for UAV trajectory design. Accordingly, we detail a multiscale decomposition of this construction: outer decisions on radial wait velocities and end positions optimize the expected long-term delay-power trade-off; consequently, inner decisions on angular wait velocities, service schedules, and UAV trajectories greedily minimize the instantaneous delay-power costs. Next, generalizing to UAV swarms via replication and consensus-driven command-and-control, this policy is embedded with spread maximization and conflict resolution heuristics. We demonstrate that our framework offers superior performance vis-\`a-vis average service latencies and average per-UAV power consumption: 11x faster data payload delivery relative to static UAV-relay deployments and 2x faster than a deep-Q network solution; remarkably, one relay with our scheme outclasses three relays under a joint successive convex approximation policy by 62%.
2022-09-16 06:09:34 - AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in Educational Applications - *Yusen Zhang, Zhongli Li, Qingyu Zhou, Ziyi Liu, Chao Li, Mina Ma, Yunbo Cao, Hongzhi Liu* - `2208.12505v2` - [abs](http://arxiv.org/abs/2208.12505v2) - [pdf](http://arxiv.org/pdf/2208.12505v2) > To automatically correct handwritten assignments, the traditional approach is to use an OCR model to recognize characters and compare them to answers. The OCR model easily gets confused on recognizing handwritten Chinese characters, and the textual information of the answers is missing during the model inference. However, teachers always have these answers in mind to review and correct assignments. In this paper, we focus on the Chinese cloze tests correction and propose a multimodal approach (named AiM). The encoded representations of answers interact with the visual information of students' handwriting. Instead of predicting 'right' or 'wrong', we perform the sequence labeling on the answer text to infer which answer character differs from the handwritten content in a fine-grained way. We take samples of OCR datasets as the positive samples for this task, and develop a negative sample augmentation method to scale up the training data. Experimental results show that AiM outperforms OCR-based methods by a large margin. Extensive studies demonstrate the effectiveness of our multimodal approach.
2022-09-16 06:15:26 - CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation - *Zhenhua Xu, Yuxuan Liu, Yuxiang Sun, Ming Liu, Lujia Wang* - `2209.07734v1` - [abs](http://arxiv.org/abs/2209.07734v1) - [pdf](http://arxiv.org/pdf/2209.07734v1) > With the rapid development of autonomous vehicles, there witnesses a booming demand for high-definition maps (HD maps) that provide reliable and robust prior information of static surroundings in autonomous driving scenarios. As one of the main high-level elements in the HD map, the road lane centerline is critical for downstream tasks, such as prediction and planning. Manually annotating lane centerline HD maps by human annotators is labor-intensive, expensive and inefficient, severely restricting the wide application and fast deployment of autonomous driving systems. Previous works seldom explore the centerline HD map mapping problem due to the complicated topology and severe overlapping issues of road centerlines. In this paper, we propose a novel method named CenterLineDet to create the lane centerline HD map automatically. CenterLineDet is trained by imitation learning and can effectively detect the graph of lane centerlines by iterations with vehicle-mounted sensors. Due to the application of the DETR-like transformer network, CenterLineDet can handle complicated graph topology, such as lane intersections. The proposed approach is evaluated on a large publicly available dataset Nuscenes, and the superiority of CenterLineDet is well demonstrated by the comparison results. This paper is accompanied by a demo video and a supplementary document that are available at \url{https://tonyxuqaq.github.io/projects/CenterLineDet/}.
2022-09-16 07:04:48 - SANCL: Multimodal Review Helpfulness Prediction with Selective Attention and Natural Contrastive Learning - *Wei Han, Hui Chen, Zhen Hai, Soujanya Poria, Lidong Bing* - `2209.05040v4` - [abs](http://arxiv.org/abs/2209.05040v4) - [pdf](http://arxiv.org/pdf/2209.05040v4) > With the boom of e-commerce, Multimodal Review Helpfulness Prediction (MRHP), which aims to sort product reviews according to the predicted helpfulness scores has become a research hotspot. Previous work on this task focuses on attention-based modality fusion, information integration, and relation modeling, which primarily exposes the following drawbacks: 1) the model may fail to capture the really essential information due to its indiscriminate attention formulation; 2) lack appropriate modeling methods that take full advantage of correlation among provided data. In this paper, we propose SANCL: Selective Attention and Natural Contrastive Learning for MRHP. SANCL adopts a probe-based strategy to enforce high attention weights on the regions of greater significance. It also constructs a contrastive learning framework based on natural matching properties in the dataset. Experimental results on two benchmark datasets with three categories show that SANCL achieves state-of-the-art baseline performance with lower memory consumption.
2022-09-16 12:07:37 - Deep learning for brain metastasis detection and segmentation in longitudinal MRI data - *Yixing Huang, Christoph Bert, Philipp Sommer, Benjamin Frey, Udo Gaipl, Luitpold V. Distel, Thomas Weissmann, Michael Uder, Manuel A. Schmidt, Arnd Dörfler, Andreas Maier, Rainer Fietkau, Florian Putz* - `2112.11833v5` - [abs](http://arxiv.org/abs/2112.11833v5) - [pdf](http://arxiv.org/pdf/2112.11833v5) > Brain metastases occur frequently in patients with metastatic cancer. Early and accurate detection of brain metastases is very essential for treatment planning and prognosis in radiation therapy. To improve brain metastasis detection performance with deep learning, a custom detection loss called volume-level sensitivity-specificity (VSS) is proposed, which rates individual metastasis detection sensitivity and specificity in (sub-)volume levels. As sensitivity and precision are always a trade-off in a metastasis level, either a high sensitivity or a high precision can be achieved by adjusting the weights in the VSS loss without decline in dice score coefficient for segmented metastases. To reduce metastasis-like structures being detected as false positive metastases, a temporal prior volume is proposed as an additional input of DeepMedic. The modified network is called DeepMedic+ for distinction. Our proposed VSS loss improves the sensitivity of brain metastasis detection for DeepMedic, increasing the sensitivity from 85.3% to 97.5%. Alternatively, it improves the precision from 69.1% to 98.7%. Comparing DeepMedic+ with DeepMedic with the same VSS loss, 44.4% of the false positive metastases are reduced in the high sensitivity model and the precision reaches 99.6% for the high specificity model. The mean dice coefficient for all metastases is about 0.81. With the ensemble of the high sensitivity and high specificity models, on average only 1.5 false positive metastases per patient needs further check, while the majority of true positive metastases are confirmed. The ensemble learning is able to distinguish high confidence true positive metastases from metastases candidates that require special expert review or further follow-up, being particularly well-fit to the requirements of expert support in real clinical practice.
2022-09-16 14:59:31 - Artificial Intelligence for In Silico Clinical Trials: A Review - *Zifeng Wang, Chufan Gao, Lucas M. Glass, Jimeng Sun* - `2209.09023v1` - [abs](http://arxiv.org/abs/2209.09023v1) - [pdf](http://arxiv.org/pdf/2209.09023v1) > A clinical trial is an essential step in drug development, which is often costly and time-consuming. In silico trials are clinical trials conducted digitally through simulation and modeling as an alternative to traditional clinical trials. AI-enabled in silico trials can increase the case group size by creating virtual cohorts as controls. In addition, it also enables automation and optimization of trial design and predicts the trial success rate. This article systematically reviews papers under three main topics: clinical simulation, individualized predictive modeling, and computer-aided trial design. We focus on how machine learning (ML) may be applied in these applications. In particular, we present the machine learning problem formulation and available data sources for each task. We end with discussing the challenges and opportunities of AI for in silico trials in real-world applications.
2022-09-16 15:48:35 - Causes of Catastrophic Forgetting in Class-Incremental Semantic Segmentation - *Tobias Kalb, Jürgen Beyerer* - `2209.08010v1` - [abs](http://arxiv.org/abs/2209.08010v1) - [pdf](http://arxiv.org/pdf/2209.08010v1) > Class-incremental learning for semantic segmentation (CiSS) is presently a highly researched field which aims at updating a semantic segmentation model by sequentially learning new semantic classes. A major challenge in CiSS is overcoming the effects of catastrophic forgetting, which describes the sudden drop of accuracy on previously learned classes after the model is trained on a new set of classes. Despite latest advances in mitigating catastrophic forgetting, the underlying causes of forgetting specifically in CiSS are not well understood. Therefore, in a set of experiments and representational analyses, we demonstrate that the semantic shift of the background class and a bias towards new classes are the major causes of forgetting in CiSS. Furthermore, we show that both causes mostly manifest themselves in deeper classification layers of the network, while the early layers of the model are not affected. Finally, we demonstrate how both causes are effectively mitigated utilizing the information contained in the background, with the help of knowledge distillation and an unbiased cross-entropy loss.
2022-09-16 16:50:41 - Self-Optimizing Feature Transformation - *Meng Xiao, Dongjie Wang, Yanjie Fu, Kunpeng Liu, Min Wu, Hui Xiong, Yuanchun Zhou* - `2209.08044v1` - [abs](http://arxiv.org/abs/2209.08044v1) - [pdf](http://arxiv.org/pdf/2209.08044v1) > Feature transformation aims to extract a good representation (feature) space by mathematically transforming existing features. It is crucial to address the curse of dimensionality, enhance model generalization, overcome data sparsity, and expand the availability of classic models. Current research focuses on domain knowledge-based feature engineering or learning latent representations; nevertheless, these methods are not entirely automated and cannot produce a traceable and optimal representation space. When rebuilding a feature space for a machine learning task, can these limitations be addressed concurrently? In this extension study, we present a self-optimizing framework for feature transformation. To achieve a better performance, we improved the preliminary work by (1) obtaining an advanced state representation for enabling reinforced agents to comprehend the current feature set better; and (2) resolving Q-value overestimation in reinforced agents for learning unbiased and effective policies. Finally, to make experiments more convincing than the preliminary work, we conclude by adding the outlier detection task with five datasets, evaluating various state representation approaches, and comparing different training strategies. Extensive experiments and case studies show that our work is more effective and superior.
2022-09-16 21:04:42 - Deep learning for reconstructing protein structures from cryo-EM density maps: recent advances and future directions - *Nabin Giri, Raj S. Roy, Jianlin Cheng* - `2209.08171v1` - [abs](http://arxiv.org/abs/2209.08171v1) - [pdf](http://arxiv.org/pdf/2209.08171v1) > Cryo-Electron Microscopy (cryo-EM) has emerged as a key technology to determine the structure of proteins, particularly large protein complexes and assemblies in recent years. A key challenge in cryo-EM data analysis is to automatically reconstruct accurate protein structures from cryo-EM density maps. In this review, we briefly overview various deep learning methods for building protein structures from cryo-EM density maps, analyze their impact, and discuss the challenges of preparing high-quality data sets for training deep learning models. Looking into the future, more advanced deep learning models of effectively integrating cryo-EM data with other sources of complementary data such as protein sequences and AlphaFold-predicted structures need to be developed to further advance the field.
2022-09-16 21:57:39 - Cell Attention Networks - *Lorenzo Giusti, Claudio Battiloro, Lucia Testa, Paolo Di Lorenzo, Stefania Sardellitti, Sergio Barbarossa* - `2209.08179v1` - [abs](http://arxiv.org/abs/2209.08179v1) - [pdf](http://arxiv.org/pdf/2209.08179v1) > Since their introduction, graph attention networks achieved outstanding results in graph representation learning tasks. However, these networks consider only pairwise relationships among nodes and then they are not able to fully exploit higher-order interactions present in many real world data-sets. In this paper, we introduce Cell Attention Networks (CANs), a neural architecture operating on data defined over the vertices of a graph, representing the graph as the 1-skeleton of a cell complex introduced to capture higher order interactions. In particular, we exploit the lower and upper neighborhoods, as encoded in the cell complex, to design two independent masked self-attention mechanisms, thus generalizing the conventional graph attention strategy. The approach used in CANs is hierarchical and it incorporates the following steps: i) a lifting algorithm that learns {\it edge features} from {\it node features}; ii) a cell attention mechanism to find the optimal combination of edge features over both lower and upper neighbors; iii) a hierarchical {\it edge pooling} mechanism to extract a compact meaningful set of features. The experimental results show that CAN is a low complexity strategy that compares favorably with state of the art results on graph-based learning tasks.
2022-09-17 02:23:09 - Neural Implicit Surface Reconstruction using Imaging Sonar - *Mohamad Qadri, Michael Kaess, Ioannis Gkioulekas* - `2209.08221v1` - [abs](http://arxiv.org/abs/2209.08221v1) - [pdf](http://arxiv.org/pdf/2209.08221v1) > We present a technique for dense 3D reconstruction of objects using an imaging sonar, also known as forward-looking sonar (FLS). Compared to previous methods that model the scene geometry as point clouds or volumetric grids, we represent the geometry as a neural implicit function. Additionally, given such a representation, we use a differentiable volumetric renderer that models the propagation of acoustic waves to synthesize imaging sonar measurements. We perform experiments on real and synthetic datasets and show that our algorithm reconstructs high-fidelity surface geometry from multi-view FLS images at much higher quality than was possible with previous techniques and without suffering from their associated memory overhead.
2022-09-17 07:23:57 - Complex Knowledge Base Question Answering: A Survey - *Yunshi Lan, Gaole He, Jinhao Jiang, Jing Jiang, Wayne Xin Zhao, Ji-Rong Wen* - `2108.06688v4` - [abs](http://arxiv.org/abs/2108.06688v4) - [pdf](http://arxiv.org/pdf/2108.06688v4) > Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.
2022-09-17 13:34:17 - Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey - *Xiaodan Xing, Huanjun Wu, Lichao Wang, Iain Stenson, May Yong, Javier Del Ser, Simon Walsh, Guang Yang* - `2209.09239v1` - [abs](http://arxiv.org/abs/2209.09239v1) - [pdf](http://arxiv.org/pdf/2209.09239v1) > Data quality is the key factor for the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can help improve the accuracy, robustness and privacy of downstream AI algorithms. However, access to good quality datasets is limited by the technical difficulty of data acquisition and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with a similar distribution as real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Thus, in this paper, we will review the synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-styled review paper will provide comprehensive descriptions of non-imaging medical data synthesis on aspects including algorithms, evaluations, limitations and future research directions.
2022-09-18 10:23:52 - Knowledge Base Question Answering: A Semantic Parsing Perspective - *Yu Gu, Vardaan Pahuja, Gong Cheng, Yu Su* - `2209.04994v3` - [abs](http://arxiv.org/abs/2209.04994v3) - [pdf](http://arxiv.org/pdf/2209.04994v3) > Recent advances in deep learning have greatly propelled the research on semantic parsing. Improvement has since been made in many downstream tasks, including natural language interface to web APIs, text-to-SQL generation, among others. However, despite the close connection shared with these tasks, research on question answering over knowledge bases (KBQA) has comparatively been progressing slowly. We identify and attribute this to two unique challenges of KBQA, schema-level complexity and fact-level complexity. In this survey, we situate KBQA in the broader literature of semantic parsing and give a comprehensive account of how existing KBQA approaches attempt to address the unique challenges. Regardless of the unique challenges, we argue that we can still take much inspiration from the literature of semantic parsing, which has been overlooked by existing research on KBQA. Based on our discussion, we can better understand the bottleneck of current KBQA research and shed light on promising directions for KBQA to keep up with the literature of semantic parsing, particularly in the era of pre-trained language models.
2022-09-18 12:00:44 - Explain and Conquer: Personalised Text-based Reviews to Achieve Transparency - *Iñigo López-Riobóo Botana, Verónica Bolón-Canedo, Bertha Guijarro-Berdiñas, Amparo Alonso-Betanzos* - `2205.01759v2` - [abs](http://arxiv.org/abs/2205.01759v2) - [pdf](http://arxiv.org/pdf/2205.01759v2) > There are many contexts in which dyadic data are present. Social networks are a well-known example. In these contexts, pairs of elements are linked building a network that reflects interactions. Explaining why these relationships are established is essential to obtain transparency, an increasingly important notion. These explanations are often presented using text, thanks to the spread of the natural language understanding tasks. Our aim is to represent and explain pairs established by any agent (e.g., a recommender system or a paid promotion mechanism), so that text-based personalisation is taken into account. We have focused on the TripAdvisor platform, considering the applicability to other dyadic data contexts. The items are a subset of users and restaurants and the interactions the reviews posted by these users. We propose the PTER (Personalised TExt-based Reviews) model. We predict, from the available reviews for a given restaurant, those that fit to the specific user interactions. PTER leverages the BERT (Bidirectional Encoders Representations from Transformers) transformer-encoder model. We customised a deep neural network following the feature-based approach, presenting a LTR (Learning To Rank) downstream task. We carried out several comparisons of our proposal with a random baseline and other models of the state of the art, following the EXTRA (EXplanaTion RAnking) benchmark. Our method outperforms other collaborative filtering proposals.
2022-09-18 21:36:12 - Achilles Heels for AGI/ASI via Decision Theoretic Adversaries - *Stephen Casper* - `2010.05418v7` - [abs](http://arxiv.org/abs/2010.05418v7) - [pdf](http://arxiv.org/pdf/2010.05418v7) > As progress in AI continues to advance, it is crucial to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) systems should be modeled as as something which humans, by definition, can't reliably outsmart. As a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions which cause them to make obviously irrational decisions in adversarial settings. In a survey of relevant dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made toward understanding the ways in which these weaknesses might be implanted into a system.
2022-09-18 23:11:24 - PyTorch Geometric Signed Directed: A Software Package on Graph Neural Networks for Signed and Directed Graphs - *Yixuan He, Xitong Zhang, Junjie Huang, Benedek Rozemberczki, Mihai Cucuringu, Gesine Reinert* - `2202.10793v3` - [abs](http://arxiv.org/abs/2202.10793v3) - [pdf](http://arxiv.org/pdf/2202.10793v3) > Networks are ubiquitous in many real-world applications (e.g., social networks encoding trust/distrust relationships, correlation networks arising from time series data). While many networks are signed or directed, or both, there is a lack of unified software packages on graph neural networks (GNNs) specially designed for signed and directed networks. In this paper, we present PyTorch Geometric Signed Directed, a software package which fills this gap. Along the way, we also provide a brief review surveying typical tasks, loss functions and evaluation metrics in the analysis of signed and directed networks, discuss data used in related experiments, provide an overview of methods proposed, and evaluate the implemented methods with experiments. The deep learning framework consists of easy-to-use GNN models, synthetic and real-world data, as well as task-specific evaluation metrics and loss functions for signed and directed networks. As an extension library for PyTorch Geometric, our proposed software is maintained with open-source releases, detailed documentation, continuous integration, unit tests and code coverage checks. Our code is publicly available at \url{https://github.com/SherylHYX/pytorch_geometric_signed_directed}.
2022-09-18 23:16:47 - On the Whitney near extension problem, BMO, alignment of data, best approximation in algebraic geometry, manifold learning and their beautiful connections: A modern treatment - *Steven B. Damelin* - `2103.09748v6` - [abs](http://arxiv.org/abs/2103.09748v6) - [pdf](http://arxiv.org/pdf/2103.09748v6) > This paper provides fascinating connections between several mathematical problems which lie on the intersection of several mathematics subjects, namely algebraic geometry, approximation theory, complex-harmonic analysis and high dimensional data science. Modern techniques in algebraic geometry, approximation theory, computational harmonic analysis and extensions develop the first of its kind, a unified framework which allows for a simultaneous study of labeled and unlabeled near alignment data problems in of $\mathbb R^D$ with the near isometry extension problem for discrete and non-discrete subsets of $\mathbb R^D$ with certain geometries. In addition, the paper surveys related work on clustering, dimension reduction, manifold learning, vision as well as minimal energy partitions, discrepancy and min-max optimization. Numerous open problems are given.
2022-09-19 00:31:29 - Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search - *Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon* - `2209.08687v1` - [abs](http://arxiv.org/abs/2209.08687v1) - [pdf](http://arxiv.org/pdf/2209.08687v1) > High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in developing the search queries. Queries in this context are highly complex, based on Boolean logic, include free-text terms and index terms from standardised terminologies (e.g., the Medical Subject Headings (MeSH) thesaurus), and are difficult and time-consuming to build. The use of MeSH terms, in particular, has been shown to improve the quality of the search results. However, identifying the correct MeSH terms to include in a query is difficult: information experts are often unfamiliar with the MeSH database and unsure about the appropriateness of MeSH terms for a query. Naturally, the full value of the MeSH terminology is often not fully exploited. This article investigates methods to suggest MeSH terms based on an initial Boolean query that includes only free-text terms. In this context, we devise lexical and pre-trained language models based methods. These methods promise to automatically identify highly effective MeSH terms for inclusion in a systematic review query. Our study contributes an empirical evaluation of several MeSH term suggestion methods. We further contribute an extensive analysis of MeSH term suggestions for each method and how these suggestions impact the effectiveness of Boolean queries.
2022-09-19 01:13:42 - LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation - *Yulia Otmakhova, Hung Thinh Truong, Timothy Baldwin, Trevor Cohn, Karin Verspoor, Jey Han Lau* - `2209.08698v1` - [abs](http://arxiv.org/abs/2209.08698v1) - [pdf](http://arxiv.org/pdf/2209.08698v1) > In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task. Specifically, we adapt PRIMERA (Xiao et al., 2022) to the biomedical domain by placing global attention on important biomedical entities in several ways. We analyse the outputs of the 23 resulting models, and report patterns in the results related to the presence of additional global attention, number of training steps, and the input configuration.
2022-09-19 02:04:49 - MDA GAN: Adversarial-Learning-based 3-D Seismic Data Interpolation and Reconstruction for Complex Missing - *Yimin Dou, Kewen Li, Hongjie Duan, Timing Li, Lin Dong, Zongchao Huang* - `2204.03197v4` - [abs](http://arxiv.org/abs/2204.03197v4) - [pdf](http://arxiv.org/pdf/2204.03197v4) > The interpolation and reconstruction of missing traces is a crucial step in seismic data processing, moreover it is also a highly ill-posed problem, especially for complex cases such as high-ratio random discrete missing, continuous missing and missing in fault-rich or salt body surveys. These complex cases are rarely mentioned in current works. To cope with complex missing cases, we propose Multi-Dimensional Adversarial GAN (MDA GAN), a novel 3-D GAN framework. It keeps anisotropy and spatial continuity of the data after 3D complex missing reconstruction using three discriminators. The feature stitching module is designed and embedded in the generator to retain more information of the input data. The Tanh cross entropy (TCE) loss is derived, which provides the generator with the optimal reconstruction gradient to make the generated data smoother and continuous. We experimentally verified the effectiveness of the individual components of the study and then tested the method on multiple publicly available data. The method achieves reasonable reconstructions for up to 95% of random discrete missing and 100 traces of continuous missing. In fault and salt body enriched surveys, MDA GAN still yields promising results for complex cases. Experimentally it has been demonstrated that our method achieves better performance than other methods in both simple and complex cases.https://github.com/douyimin/MDA_GAN
2022-09-19 02:47:48 - Ensembles of Compact, Region-specific & Regularized Spiking Neural Networks for Scalable Place Recognition - *Somayeh Hussaini, Michael Milford, Tobias Fischer* - `2209.08723v1` - [abs](http://arxiv.org/abs/2209.08723v1) - [pdf](http://arxiv.org/pdf/2209.08723v1) > Spiking neural networks have significant potential utility in robotics due to their high energy efficiency on specialized hardware, but proof-of-concept implementations have not yet typically achieved competitive performance or capability with conventional approaches. In this paper, we tackle one of the key practical challenges of scalability by introducing a novel modular ensemble network approach, where compact, localized spiking networks each learn and are solely responsible for recognizing places in a local region of the environment only. This modular approach creates a highly scalable system. However, it comes with a high-performance cost where a lack of global regularization at deployment time leads to hyperactive neurons that erroneously respond to places outside their learned region. Our second contribution introduces a regularization approach that detects and removes these problematic hyperactive neurons during the initial environmental learning phase. We evaluate this new scalable modular system on benchmark localization datasets Nordland and Oxford RobotCar, with comparisons to both standard techniques NetVLAD and SAD, and a previous spiking neural network system. Our system substantially outperforms the previous SNN system on its small dataset, but also maintains performance on 27 times larger benchmark datasets where the operation of the previous system is computationally infeasible, and performs competitively with the conventional localization systems.
2022-09-19 13:03:02 - An Overview on the Generation and Detection of Synthetic and Manipulated Satellite Images - *Lydia Abady, Edoardo Daniele Cannas, Paolo Bestagini, Benedetta Tondi, Stefano Tubaro, Mauro Barni* - `2209.08984v1` - [abs](http://arxiv.org/abs/2209.08984v1) - [pdf](http://arxiv.org/pdf/2209.08984v1) > Due to the reduction of technological costs and the increase of satellites launches, satellite images are becoming more popular and easier to obtain. Besides serving benevolent purposes, satellite data can also be used for malicious reasons such as misinformation. As a matter of fact, satellite images can be easily manipulated relying on general image editing tools. Moreover, with the surge of Deep Neural Networks (DNNs) that can generate realistic synthetic imagery belonging to various domains, additional threats related to the diffusion of synthetically generated satellite images are emerging. In this paper, we review the State of the Art (SOTA) on the generation and manipulation of satellite images. In particular, we focus on both the generation of synthetic satellite imagery from scratch, and the semantic manipulation of satellite images by means of image-transfer technologies, including the transformation of images obtained from one type of sensor to another one. We also describe forensic detection techniques that have been researched so far to classify and detect synthetic image forgeries. While we focus mostly on forensic techniques explicitly tailored to the detection of AI-generated synthetic contents, we also review some methods designed for general splicing detection, which can in principle also be used to spot AI manipulate images
2022-09-19 14:50:48 - Deep Metric Learning with Chance Constraints - *Yeti Z. Gurbuz, Ogul Can, A. Aydin Alatan* - `2209.09060v1` - [abs](http://arxiv.org/abs/2209.09060v1) - [pdf](http://arxiv.org/pdf/2209.09060v1) > Deep metric learning (DML) aims to minimize empirical expected loss of the pairwise intra-/inter- class proximity violations in the embedding image. We relate DML to feasibility problem of finite chance constraints. We show that minimizer of proxy-based DML satisfies certain chance constraints, and that the worst case generalization performance of the proxy-based methods can be characterized by the radius of the smallest ball around a class proxy to cover the entire domain of the corresponding class samples, suggesting multiple proxies per class helps performance. To provide a scalable algorithm as well as exploiting more proxies, we consider the chance constraints implied by the minimizers of proxy-based DML instances and reformulate DML as finding a feasible point in intersection of such constraints, resulting in a problem to be approximately solved by iterative projections. Simply put, we repeatedly train a regularized proxy-based loss and re-initialize the proxies with the embeddings of the deliberately selected new samples. We apply our method with the well-accepted losses and evaluate on four popular benchmark datasets for image retrieval. Outperforming state-of-the-art, our method consistently improves the performance of the applied losses. Code is available at: https://github.com/yetigurbuz/ccp-dml
2022-09-19 14:54:41 - Overview of the SV-Ident 2022 Shared Task on Survey Variable Identification in Social Science Publications - *Tornike Tsereteli, Yavuz Selim Kartal, Simone Paolo Ponzetto, Andrea Zielinski, Kai Eckert, Philipp Mayr* - `2209.09062v1` - [abs](http://arxiv.org/abs/2209.09062v1) - [pdf](http://arxiv.org/pdf/2209.09062v1) > In this paper, we provide an overview of the SV-Ident shared task as part of the 3rd Workshop on Scholarly Document Processing (SDP) at COLING 2022. In the shared task, participants were provided with a sentence and a vocabulary of variables, and asked to identify which variables, if any, are mentioned in individual sentences from scholarly documents in full text. Two teams made a total of 9 submissions to the shared task leaderboard. While none of the teams improve on the baseline systems, we still draw insights from their submissions. Furthermore, we provide a detailed evaluation. Data and baselines for our shared task are freely available at https://github.com/vadis-project/sv-ident
2022-09-19 16:23:43 - RESHAPE: Explaining Accounting Anomalies in Financial Statement Audits by enhancing SHapley Additive exPlanations - *Ricardo Müller, Marco Schreyer, Timur Sattarov, Damian Borth* - `2209.09157v1` - [abs](http://arxiv.org/abs/2209.09157v1) - [pdf](http://arxiv.org/pdf/2209.09157v1) > Detecting accounting anomalies is a recurrent challenge in financial statement audits. Recently, novel methods derived from Deep-Learning (DL) have been proposed to audit the large volumes of a statement's underlying accounting records. However, due to their vast number of parameters, such models exhibit the drawback of being inherently opaque. At the same time, the concealing of a model's inner workings often hinders its real-world application. This observation holds particularly true in financial audits since auditors must reasonably explain and justify their audit decisions. Nowadays, various Explainable AI (XAI) techniques have been proposed to address this challenge, e.g., SHapley Additive exPlanations (SHAP). However, in unsupervised DL as often applied in financial audits, these methods explain the model output at the level of encoded variables. As a result, the explanations of Autoencoder Neural Networks (AENNs) are often hard to comprehend by human auditors. To mitigate this drawback, we propose (RESHAPE), which explains the model output on an aggregated attribute-level. In addition, we introduce an evaluation framework to compare the versatility of XAI methods in auditing. Our experimental results show empirical evidence that RESHAPE results in versatile explanations compared to state-of-the-art baselines. We envision such attribute-level explanations as a necessary next step in the adoption of unsupervised DL techniques in financial auditing.
2022-09-19 16:38:40 - QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension - *Anna Rogers, Matt Gardner, Isabelle Augenstein* - `2107.12708v2` - [abs](http://arxiv.org/abs/2107.12708v2) - [pdf](http://arxiv.org/pdf/2107.12708v2) > Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "skills" that question answering/reading comprehension systems are supposed to acquire, and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of over-focusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.
2022-09-20 00:45:18 - Data Representativeness in Accessibility Datasets: A Meta-Analysis - *Rie Kamikubo, Lining Wang, Crystal Marte, Amnah Mahmood, Hernisa Kacorri* - `2207.08037v2` - [abs](http://arxiv.org/abs/2207.08037v2) - [pdf](http://arxiv.org/pdf/2207.08037v2) > As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets - datasets sourced from people with disabilities and older adults - that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.
2022-09-20 06:20:45 - Generating Persuasive Responses to Customer Reviews with Multi-Source Prior Knowledge in E-commerce - *Bo Chen, Jiayi Liu, Mieradilijiang Maimaiti, Xing Gao, Ji Zhang* - `2209.09497v1` - [abs](http://arxiv.org/abs/2209.09497v1) - [pdf](http://arxiv.org/pdf/2209.09497v1) > Customer reviews usually contain much information about one's online shopping experience. While positive reviews are beneficial to the stores, negative ones will largely influence consumers' decision and may lead to a decline in sales. Therefore, it is of vital importance to carefully and persuasively reply to each negative review and minimize its disadvantageous effect. Recent studies consider leveraging generation models to help the sellers respond. However, this problem is not well-addressed as the reviews may contain multiple aspects of issues which should be resolved accordingly and persuasively. In this work, we propose a Multi-Source Multi-Aspect Attentive Generation model for persuasive response generation. Various sources of information are appropriately obtained and leveraged by the proposed model for generating more informative and persuasive responses. A multi-aspect attentive network is proposed to automatically attend to different aspects in a review and ensure most of the issues are tackled. Extensive experiments on two real-world datasets, demonstrate that our approach outperforms the state-of-the-art methods and online tests prove that our deployed system significantly enhances the efficiency of the stores' dealing with negative reviews.
2022-09-20 07:10:45 - Responsible-AI-by-Design: a Pattern Collection for Designing Responsible AI Systems - *Qinghua Lu, Liming Zhu, Xiwei Xu, Jon Whittle* - `2203.00905v2` - [abs](http://arxiv.org/abs/2203.00905v2) - [pdf](http://arxiv.org/pdf/2203.00905v2) > Although AI has significant potential to transform society, there are serious concerns about its ability to behave and make decisions responsibly. Many ethical regulations, principles, and guidelines for responsible AI have been issued recently. However, these principles are high-level and difficult to put into practice. In the meantime much effort has been put into responsible AI from the algorithm perspective, but they are limited to a small subset of ethical principles amenable to mathematical analysis. Responsible AI issues go beyond data and algorithms and are often at the system-level crosscutting many system components and the entire software engineering lifecycle. Based on the result of a systematic literature review, this paper identifies one missing element as the system-level guidance - how to design the architecture of responsible AI systems. We present a summary of design patterns that can be embedded into the AI systems as product features to contribute to responsible-AI-by-design.
2022-09-20 07:41:24 - Review of data types and model dimensionality for cardiac DTI SMS-related artefact removal - *Michael Tanzer, Sea Hee Yook, Guang Yang, Daniel Rueckert, Sonia Nielles-Vallespin* - `2209.09522v1` - [abs](http://arxiv.org/abs/2209.09522v1) - [pdf](http://arxiv.org/pdf/2209.09522v1) > As diffusion tensor imaging (DTI) gains popularity in cardiac imaging due to its unique ability to non-invasively assess the cardiac microstructure, deep learning-based Artificial Intelligence is becoming a crucial tool in mitigating some of its drawbacks, such as the long scan times. As it often happens in fast-paced research environments, a lot of emphasis has been put on showing the capability of deep learning while often not enough time has been spent investigating what input and architectural properties would benefit cardiac DTI acceleration the most. In this work, we compare the effect of several input types (magnitude images vs complex images), multiple dimensionalities (2D vs 3D operations), and multiple input types (single slice vs multi-slice) on the performance of a model trained to remove artefacts caused by a simultaneous multi-slice (SMS) acquisition. Despite our initial intuition, our experiments show that, for a fixed number of parameters, simpler 2D real-valued models outperform their more advanced 3D or complex counterparts. The best performance is although obtained by a real-valued model trained using both the magnitude and phase components of the acquired data. We believe this behaviour to be due to real-valued models making better use of the lower number of parameters, and to 3D models not being able to exploit the spatial information because of the low SMS acceleration factor used in our experiments.
2022-09-20 16:49:56 - X-Risk Analysis for AI Research - *Dan Hendrycks, Mantas Mazeika* - `2206.05862v7` - [abs](http://arxiv.org/abs/2206.05862v7) - [pdf](http://arxiv.org/pdf/2206.05862v7) > Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind the potential benefits of AI, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we provide a guide for how to analyze AI x-risk, which consists of three parts: First, we review how systems can be made safer today, drawing on time-tested concepts from hazard analysis and systems safety that have been designed to steer large processes in safer directions. Next, we discuss strategies for having long-term impacts on the safety of future systems. Finally, we discuss a crucial concept in making AI systems safer by improving the balance between safety and general capabilities. We hope this document and the presented concepts and tools serve as a useful guide for understanding how to analyze AI x-risk.
2022-09-20 17:44:48 - Computational Sarcasm Analysis on Social Media: A Systematic Review - *Faria Binte Kader, Nafisa Hossain Nujat, Tasmia Binte Sogir, Mohsinul Kabir, Hasan Mahmud, Kamrul Hasan* - `2209.06170v2` - [abs](http://arxiv.org/abs/2209.06170v2) - [pdf](http://arxiv.org/pdf/2209.06170v2) > Sarcasm can be defined as saying or writing the opposite of what one truly wants to express, usually to insult, irritate, or amuse someone. Because of the obscure nature of sarcasm in textual data, detecting it is difficult and of great interest to the sentiment analysis research community. Though the research in sarcasm detection spans more than a decade, some significant advancements have been made recently, including employing unsupervised pre-trained transformers in multimodal environments and integrating context to identify sarcasm. In this study, we aim to provide a brief overview of recent advancements and trends in computational sarcasm research for the English language. We describe relevant datasets, methodologies, trends, issues, challenges, and tasks relating to sarcasm that are beyond detection. Our study provides well-summarized tables of sarcasm datasets, sarcastic features and their extraction methods, and performance analysis of various approaches which can help researchers in related domains understand current state-of-the-art practices in sarcasm detection.
2022-09-20 18:35:32 - Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI - *Q. Vera Liao, Yunfeng Zhang, Ronny Luss, Finale Doshi-Velez, Amit Dhurandhar* - `2206.10847v3` - [abs](http://arxiv.org/abs/2206.10847v3) - [pdf](http://arxiv.org/pdf/2206.10847v3) > Recent years have seen a surge of interest in the field of explainable AI (XAI), with a plethora of algorithms proposed in the literature. However, a lack of consensus on how to evaluate XAI hinders the advancement of the field. We highlight that XAI is not a monolithic set of technologies -- researchers and practitioners have begun to leverage XAI algorithms to build XAI systems that serve different usage contexts, such as model debugging and decision-support. Algorithmic research of XAI, however, often does not account for these diverse downstream usage contexts, resulting in limited effectiveness or even unintended consequences for actual users, as well as difficulties for practitioners to make technical choices. We argue that one way to close the gap is to develop evaluation methods that account for different user requirements in these usage contexts. Towards this goal, we introduce a perspective of contextualized XAI evaluation by considering the relative importance of XAI evaluation criteria for prototypical usage contexts of XAI. To explore the context dependency of XAI evaluation criteria, we conduct two survey studies, one with XAI topical experts and another with crowd workers. Our results urge for responsible AI research with usage-informed evaluation practices, and provide a nuanced understanding of user requirements for XAI in different usage contexts.
2022-09-21 04:34:17 - A Comprehensive Survey on Trustworthy Recommender Systems - *Wenqi Fan, Xiangyu Zhao, Xiao Chen, Jingran Su, Jingtong Gao, Lin Wang, Qidong Liu, Yiqi Wang, Han Xu, Lei Chen, Qing Li* - `2209.10117v1` - [abs](http://arxiv.org/abs/2209.10117v1) - [pdf](http://arxiv.org/pdf/2209.10117v1) > As one of the most successful AI-powered applications, recommender systems aim to help people make appropriate decisions in an effective and efficient way, by providing personalized suggestions in many aspects of our lives, especially for various human-oriented online services such as e-commerce platforms and social media sites. In the past few decades, the rapid developments of recommender systems have significantly benefited human by creating economic value, saving time and effort, and promoting social good. However, recent studies have found that data-driven recommender systems can pose serious threats to users and society, such as spreading fake news to manipulate public opinion in social media sites, amplifying unfairness toward under-represented groups or individuals in job matching services, or inferring privacy information from recommendation results. Therefore, systems' trustworthiness has been attracting increasing attention from various aspects for mitigating negative impacts caused by recommender systems, so as to enhance the public's trust towards recommender systems techniques. In this survey, we provide a comprehensive overview of Trustworthy Recommender systems (TRec) with a specific focus on six of the most important aspects; namely, Safety & Robustness, Nondiscrimination & Fairness, Explainability, Privacy, Environmental Well-being, and Accountability & Auditability. For each aspect, we summarize the recent related technologies and discuss potential research directions to help achieve trustworthy recommender systems in the future.
2022-09-21 05:38:23 - A Systematic Literature Review of Soft Computing Techniques for Software Maintainability Prediction: State-of-the-Art, Challenges and Future Directions - *Gokul Yenduri, Thippa Reddy Gadekallu* - `2209.10131v1` - [abs](http://arxiv.org/abs/2209.10131v1) - [pdf](http://arxiv.org/pdf/2209.10131v1) > The software is changing rapidly with the invention of advanced technologies and methodologies. The ability to rapidly and successfully upgrade software in response to changing business requirements is more vital than ever. For the long-term management of software products, measuring software maintainability is crucial. The use of soft computing techniques for software maintainability prediction has shown immense promise in software maintenance process by providing accurate prediction of software maintainability. To better understand the role of soft computing techniques for software maintainability prediction, we aim to provide a systematic literature review of soft computing techniques for software maintainability prediction. Firstly, we provide a detailed overview of software maintainability. Following this, we explore the fundamentals of software maintainability and the reasons for adopting soft computing methodologies for predicting software maintainability. Later, we examine the soft computing approaches employed in the process of software maintainability prediction. Furthermore, we discuss the difficulties and potential solutions associated with the use of soft computing techniques to predict software maintainability. Finally, we conclude the review with some promising future directions to drive further research innovations and developments in this promising area.
2022-09-21 05:54:13 - Recipe Generation from Unsegmented Cooking Videos - *Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori* - `2209.10134v1` - [abs](http://arxiv.org/abs/2209.10134v1) - [pdf](http://arxiv.org/pdf/2209.10134v1) > This paper tackles recipe generation from unsegmented cooking videos, a task that requires agents to (1) extract key events in completing the dish and (2) generate sentences for the extracted events. Our task is similar to dense video captioning (DVC), which aims at detecting events thoroughly and generating sentences for them. However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should output an appropriate number of key events in the correct order. We analyze the output of the DVC model and observe that although (1) several events are adoptable as a recipe story, (2) the generated sentences for such events are not grounded in the visual content. Based on this, we hypothesize that we can obtain correct recipes by selecting oracle events from the output events of the DVC model and re-generating sentences for them. To achieve this, we propose a novel transformer-based joint approach of training event selector and sentence generator for selecting oracle events from the outputs of the DVC model and generating grounded sentences for the events, respectively. In addition, we extend the model by including ingredients to generate more accurate recipes. The experimental results show that the proposed method outperforms state-of-the-art DVC models. We also confirm that, by modeling the recipe in a story-aware manner, the proposed model output the appropriate number of events in the correct order.
2022-09-21 07:06:46 - RNGDet++: Road Network Graph Detection by Transformer with Instance Segmentation and Multi-scale Features Enhancement - *Zhenhua Xu, Yuxuan Liu, Yuxiang Sun, Ming Liu, Lujia Wang* - `2209.10150v1` - [abs](http://arxiv.org/abs/2209.10150v1) - [pdf](http://arxiv.org/pdf/2209.10150v1) > The graph structure of road networks is critical for downstream tasks of autonomous driving systems, such as global planning, motion prediction and control. In the past, the road network graph is usually manually annotated by human experts, which is time-consuming and labor-intensive. To obtain the road network graph with better effectiveness and efficiency, automatic approaches for road network graph detection are required. Previous works either post-process semantic segmentation maps or propose graph-based algorithms to directly predict the road network graph. However, previous works suffer from hard-coded heuristic processing algorithms and inferior final performance. To enhance the previous SOTA (State-of-the-Art) approach RNGDet, we add an instance segmentation head to better supervise the model training, and enable the model to leverage multi-scale features of the backbone network. Since the new proposed approach is improved from RNGDet, it is named RNGDet++. All approaches are evaluated on a large publicly available dataset. RNGDet++ outperforms baseline models on almost all metrics scores. It improves the topology correctness APLS (Average Path Length Similarity) by around 3\%. The demo video and supplementary materials are available on our project page \url{https://tonyxuqaq.github.io/projects/RNGDetPlusPlus/}.
2022-09-21 07:10:44 - Review On Deep Learning Technique For Underwater Object Detection - *Radhwan Adnan Dakhil, Ali Retha Hasoon Khayeat* - `2209.10151v1` - [abs](http://arxiv.org/abs/2209.10151v1) - [pdf](http://arxiv.org/pdf/2209.10151v1) > Repair and maintenance of underwater structures as well as marine science rely heavily on the results of underwater object detection, which is a crucial part of the image processing workflow. Although many computer vision-based approaches have been presented, no one has yet developed a system that reliably and accurately detects and categorizes objects and animals found in the deep sea. This is largely due to obstacles that scatter and absorb light in an underwater setting. With the introduction of deep learning, scientists have been able to address a wide range of issues, including safeguarding the marine ecosystem, saving lives in an emergency, preventing underwater disasters, and detecting, spooring, and identifying underwater targets. However, the benefits and drawbacks of these deep learning systems remain unknown. Therefore, the purpose of this article is to provide an overview of the dataset that has been utilized in underwater object detection and to present a discussion of the advantages and disadvantages of the algorithms employed for this purpose.
2022-09-21 08:23:40 - Chatbots for Mental Health Support: Exploring the Impact of Emohaa on Reducing Mental Distress in China - *Sahand Sabour, Wen Zhang, Xiyao Xiao, Yuwei Zhang, Yinhe Zheng, Jiaxin Wen, Jialu Zhao, Minlie Huang* - `2209.10183v1` - [abs](http://arxiv.org/abs/2209.10183v1) - [pdf](http://arxiv.org/pdf/2209.10183v1) > The growing demand for mental health support has highlighted the importance of conversational agents as human supporters worldwide and in China. These agents could increase availability and reduce the relative costs of mental health support. The provided support can be divided into two main types: cognitive and emotional support. Existing work on this topic mainly focuses on constructing agents that adopt Cognitive Behavioral Therapy (CBT) principles. Such agents operate based on pre-defined templates and exercises to provide cognitive support. However, research on emotional support using such agents is limited. In addition, most of the constructed agents operate in English, highlighting the importance of conducting such studies in China. In this study, we analyze the effectiveness of Emohaa in reducing symptoms of mental distress. Emohaa is a conversational agent that provides cognitive support through CBT-based exercises and guided conversations. It also emotionally supports users by enabling them to vent their desired emotional problems. The study included 134 participants, split into three groups: Emohaa (CBT-based), Emohaa (Full), and control. Experimental results demonstrated that compared to the control group, participants who used Emohaa experienced considerably more significant improvements in symptoms of mental distress. We also found that adding the emotional support agent had a complementary effect on such improvements, mainly depression and insomnia. Based on the obtained results and participants' satisfaction with the platform, we concluded that Emohaa is a practical and effective tool for reducing mental distress.
2022-09-21 11:54:00 - AirFi: Empowering WiFi-based Passive Human Gesture Recognition to Unseen Environment via Domain Generalization - *Dazhuo Wang, Jianfei Yang, Wei Cui, Lihua Xie, Sumei Sun* - `2209.10285v1` - [abs](http://arxiv.org/abs/2209.10285v1) - [pdf](http://arxiv.org/pdf/2209.10285v1) > WiFi-based smart human sensing technology enabled by Channel State Information (CSI) has received great attention in recent years. However, CSI-based sensing systems suffer from performance degradation when deployed in different environments. Existing works solve this problem by domain adaptation using massive unlabeled high-quality data from the new environment, which is usually unavailable in practice. In this paper, we propose a novel augmented environment-invariant robust WiFi gesture recognition system named AirFi that deals with the issue of environment dependency from a new perspective. The AirFi is a novel domain generalization framework that learns the critical part of CSI regardless of different environments and generalizes the model to unseen scenarios, which does not require collecting any data for adaptation to the new environment. AirFi extracts the common features from several training environment settings and minimizes the distribution differences among them. The feature is further augmented to be more robust to environments. Moreover, the system can be further improved by few-shot learning techniques. Compared to state-of-the-art methods, AirFi is able to work in different environment settings without acquiring any CSI data from the new environment. The experimental results demonstrate that our system remains robust in the new environment and outperforms the compared systems.
2022-09-21 12:07:16 - Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction - *Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee* - `2209.10292v1` - [abs](http://arxiv.org/abs/2209.10292v1) - [pdf](http://arxiv.org/pdf/2209.10292v1) > With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand,if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7\% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios.
2022-09-21 12:13:18 - Artificial Intelligence-Based Image Reconstruction in Cardiac Magnetic Resonance - *Chen Qin, Daniel Rueckert* - `2209.10298v1` - [abs](http://arxiv.org/abs/2209.10298v1) - [pdf](http://arxiv.org/pdf/2209.10298v1) > Artificial intelligence (AI) and Machine Learning (ML) have shown great potential in improving the medical imaging workflow, from image acquisition and reconstruction to disease diagnosis and treatment. Particularly, in recent years, there has been a significant growth in the use of AI and ML algorithms, especially Deep Learning (DL) based methods, for medical image reconstruction. DL techniques have shown to be competitive and often superior over conventional reconstruction methods in terms of both reconstruction quality and computational efficiency. The use of DL-based image reconstruction also provides promising opportunities to transform the way cardiac images are acquired and reconstructed. In this chapter, we will review recent advances in DL-based reconstruction techniques for cardiac imaging, with emphasis on cardiac magnetic resonance (CMR) image reconstruction. We mainly focus on supervised DL methods for the application, including image post-processing techniques, model-driven approaches and k-space based methods. Current limitations, challenges and future opportunities of DL for cardiac image reconstruction are also discussed.
2022-09-21 12:30:05 - Deep Learning for Medical Image Segmentation: Tricks, Challenges and Future Directions - *Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, Jinhui Tang, Kwang Ting Cheng* - `2209.10307v1` - [abs](http://arxiv.org/abs/2209.10307v1) - [pdf](http://arxiv.org/pdf/2209.10307v1) > Over the past few years, the rapid development of deep learning technologies for computer vision has greatly promoted the performance of medical image segmentation (MedISeg). However, the recent MedISeg publications usually focus on presentations of the major contributions (e.g., network architectures, training strategies, and loss functions) while unwittingly ignoring some marginal implementation details (also known as "tricks"), leading to a potential problem of the unfair experimental result comparisons. In this paper, we collect a series of MedISeg tricks for different model implementation phases (i.e., pre-training model, data pre-processing, data augmentation, model implementation, model inference, and result post-processing), and experimentally explore the effectiveness of these tricks on the consistent baseline models. Compared to paper-driven surveys that only blandly focus on the advantages and limitation analyses of segmentation models, our work provides a large number of solid experiments and is more technically operable. With the extensive experimental results on both the representative 2D and 3D medical image datasets, we explicitly clarify the effect of these tricks. Moreover, based on the surveyed tricks, we also open-sourced a strong MedISeg repository, where each of its components has the advantage of plug-and-play. We believe that this milestone work not only completes a comprehensive and complementary survey of the state-of-the-art MedISeg approaches, but also offers a practical guide for addressing the future medical image processing challenges including but not limited to small dataset learning, class imbalance learning, multi-modality learning, and domain adaptation. The code has been released at: https://github.com/hust-linyi/MedISeg
2022-09-21 13:24:20 - Partially Observable Markov Decision Processes in Robotics: A Survey - *Mikko Lauri, David Hsu, Joni Pajarinen* - `2209.10342v1` - [abs](http://arxiv.org/abs/2209.10342v1) - [pdf](http://arxiv.org/pdf/2209.10342v1) > Noisy sensing, imperfect control, and environment changes are defining characteristics of many real-world robot tasks. The partially observable Markov decision process (POMDP) provides a principled mathematical framework for modeling and solving robot decision and control tasks under uncertainty. Over the last decade, it has seen many successful applications, spanning localization and navigation, search and tracking, autonomous driving, multi-robot systems, manipulation, and human-robot interaction. This survey aims to bridge the gap between the development of POMDP models and algorithms at one end and application to diverse robot decision tasks at the other. It analyzes the characteristics of these tasks and connects them with the mathematical and algorithmic properties of the POMDP framework for effective modeling and solution. For practitioners, the survey provides some of the key task characteristics in deciding when and how to apply POMDPs to robot tasks successfully. For POMDP algorithm designers, the survey provides new insights into the unique challenges of applying POMDPs to robot systems and points to promising new directions for further research.
2022-09-21 15:10:21 - Transformers in 3D Point Clouds: A Survey - *Dening Lu, Qian Xie, Mingqiang Wei, Kyle Gao, Linlin Xu, Jonathan Li* - `2205.07417v2` - [abs](http://arxiv.org/abs/2205.07417v2) - [pdf](http://arxiv.org/pdf/2205.07417v2) > Transformers have been at the heart of the Natural Language Processing (NLP) and Computer Vision (CV) revolutions. The significant success in NLP and CV inspired exploring the use of Transformers in point cloud processing. However, how do Transformers cope with the irregularity and unordered nature of point clouds? How suitable are Transformers for different 3D representations (e.g., point- or voxel-based)? How competent are Transformers for various 3D processing tasks? As of now, there is still no systematic survey of the research on these issues. For the first time, we provided a comprehensive overview of increasingly popular Transformers for 3D point cloud analysis. We start by introducing the theory of the Transformer architecture and reviewing its applications in 2D/3D fields. Then, we present three different taxonomies (i.e., implementation-, data representation-, and task-based), which can classify current Transformer-based methods from multiple perspectives. Furthermore, we present the results of an investigation of the variants and improvements of the self-attention mechanism in 3D. To demonstrate the superiority of Transformers in point cloud analysis, we present comprehensive comparisons of various Transformer-based methods for classification, segmentation, and object detection. Finally, we suggest three potential research directions, providing benefit references for the development of 3D Transformers.
2022-09-21 16:16:23 - A Survey on Generative Diffusion Model - *Hanqun Cao, Cheng Tan, Zhangyang Gao, Guangyong Chen, Pheng-Ann Heng, Stan Z. Li* - `2209.02646v5` - [abs](http://arxiv.org/abs/2209.02646v5) - [pdf](http://arxiv.org/pdf/2209.02646v5) > Deep learning shows great potential in generation tasks thanks to deep latent representation. Generative models are classes of models that can generate observations randomly with respect to certain implied parameters. Recently, the diffusion Model becomes a raising class of generative models by virtue of its power-generating ability. Nowadays, great achievements have been reached. More applications except for computer vision, speech generation, bioinformatics, and natural language processing are to be explored in this field. However, the diffusion model has its natural drawback of a slow generation process, leading to many enhanced works. This survey makes a summary of the field of the diffusion model. We firstly state the main problem with two landmark works - DDPM and DSM. Then, we present a diverse range of advanced techniques to speed up the diffusion models - training schedule, training-free sampling, mixed-modeling, and score & diffusion unification. Regarding existing models, we also provide a benchmark of FID score, IS, and NLL according to specific NFE. Moreover, applications with diffusion models are introduced including computer vision, sequence modeling, audio, and AI for science. Finally, there is a summarization of this field together with limitations & further directions.
2022-09-21 18:52:21 - Human Treelike Tubular Structure Segmentation: A Comprehensive Review and Future Perspectives - *Hao Li, Zeyu Tang, Yang Nan, Guang Yang* - `2207.11203v2` - [abs](http://arxiv.org/abs/2207.11203v2) - [pdf](http://arxiv.org/pdf/2207.11203v2) > Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales. Examples of such structures are intrathoracic airways, retinal blood vessels, and hepatic blood vessels. Large collections of 2D and 3D images have been made available by medical imaging modalities such as magnetic resonance imaging (MRI), computed tomography (CT), Optical coherence tomography (OCT) and ultrasound in which the spatial arrangement can be observed. Segmentation of these structures in medical imaging is of great importance since the analysis of the structure provides insights into disease diagnosis, treatment planning, and prognosis. Manually labelling extensive data by radiologists is often time-consuming and error-prone. As a result, automated or semi-automated computational models have become a popular research field of medical imaging in the past two decades, and many have been developed to date. In this survey, we aim to provide a comprehensive review of currently publicly available datasets, segmentation algorithms, and evaluation metrics. In addition, current challenges and future research directions are discussed.
2022-09-21 21:02:22 - Explaining Anomalies using Denoising Autoencoders for Financial Tabular Data - *Timur Sattarov, Dayananda Herurkar, Jörn Hees* - `2209.10658v1` - [abs](http://arxiv.org/abs/2209.10658v1) - [pdf](http://arxiv.org/pdf/2209.10658v1) > Recent advances in Explainable AI (XAI) increased the demand for deployment of safe and interpretable AI models in various industry sectors. Despite the latest success of deep neural networks in a variety of domains, understanding the decision-making process of such complex models still remains a challenging task for domain experts. Especially in the financial domain, merely pointing to an anomaly composed of often hundreds of mixed type columns, has limited value for experts. Hence, in this paper, we propose a framework for explaining anomalies using denoising autoencoders designed for mixed type tabular data. We specifically focus our technique on anomalies that are erroneous observations. This is achieved by localizing individual sample columns (cells) with potential errors and assigning corresponding confidence scores. In addition, the model provides the expected cell value estimates to fix the errors. We evaluate our approach based on three standard public tabular datasets (Credit Default, Adult, IEEE Fraud) and one proprietary dataset (Holdings). We find that denoising autoencoders applied to this task already outperform other approaches in the cell error detection rates as well as in the expected value rates. Additionally, we analyze how a specialized loss designed for cell error detection can further improve these metrics. Our framework is designed for a domain expert to understand abnormal characteristics of an anomaly, as well as to improve in-house data quality management processes.
2022-09-22 03:10:45 - Reinforcement Learning in Computing and Network Convergence Orchestration - *Aidong Yang, Mohan Wu, Boquan Cheng, Xiaozhou Ye, Ye Ouyang* - `2209.10753v1` - [abs](http://arxiv.org/abs/2209.10753v1) - [pdf](http://arxiv.org/pdf/2209.10753v1) > As computing power is becoming the core productivity of the digital economy era, the concept of Computing and Network Convergence (CNC), under which network and computing resources can be dynamically scheduled and allocated according to users' needs, has been proposed and attracted wide attention. Based on the tasks' properties, the network orchestration plane needs to flexibly deploy tasks to appropriate computing nodes and arrange paths to the computing nodes. This is a orchestration problem that involves resource scheduling and path arrangement. Since CNC is relatively new, in this paper, we review some researches and applications on CNC. Then, we design a CNC orchestration method using reinforcement learning (RL), which is the first attempt, that can flexibly allocate and schedule computing resources and network resources. Which aims at high profit and low latency. Meanwhile, we use multi-factors to determine the optimization objective so that the orchestration strategy is optimized in terms of total performance from different aspects, such as cost, profit, latency and system overload in our experiment. The experiments shows that the proposed RL-based method can achieve higher profit and lower latency than the greedy method, random selection and balanced-resource method. We demonstrate RL is suitable for CNC orchestration. This paper enlightens the RL application on CNC orchestration.
2022-09-22 05:32:09 - Homophone Reveals the Truth: A Reality Check for Speech2Vec - *Guangyu Chen* - `2209.10791v1` - [abs](http://arxiv.org/abs/2209.10791v1) - [pdf](http://arxiv.org/pdf/2209.10791v1) > Generating spoken word embeddings that possess semantic information is a fascinating topic. Compared with text-based embeddings, they cover both phonetic and semantic characteristics, which can provide richer information and are potentially helpful for improving ASR and speech translation systems. In this paper, we review and examine the authenticity of a seminal work in this field: Speech2Vec. First, a homophone-based inspection method is proposed to check the speech embeddings released by the author of Speech2Vec. There is no indication that these embeddings are generated by the Speech2Vec model. Moreover, through further analysis of the vocabulary composition, we suspect that a text-based model fabricates these embeddings. Finally, we reproduce the Speech2Vec model, referring to the official code and optimal settings in the original paper. Experiments showed that this model failed to learn effective semantic embeddings. In word similarity benchmarks, it gets a correlation score of 0.08 in MEN and 0.15 in WS-353-SIM tests, which is over 0.5 lower than those described in the original paper. Our data and code are available.
2022-09-22 06:31:07 - IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion - *Seungjun Moon, GyeongMoon Park* - `2209.10811v1` - [abs](http://arxiv.org/abs/2209.10811v1) - [pdf](http://arxiv.org/pdf/2209.10811v1) > Recently, manipulation of real-world images has been highly elaborated along with the development of Generative Adversarial Networks (GANs) and corresponding encoders, which embed real-world images into the latent space. However, designing encoders of GAN still remains a challenging task due to the trade-off between distortion and perception. In this paper, we point out that the existing encoders try to lower the distortion not only on the interest region, e.g., human facial region but also on the uninterest region, e.g., background patterns and obstacles. However, most uninterest regions in real-world images are located at out-of-distribution (OOD), which are infeasible to be ideally reconstructed by generative models. Moreover, we empirically find that the uninterest region overlapped with the interest region can mangle the original feature of the interest region, e.g., a microphone overlapped with a facial region is inverted into the white beard. As a result, lowering the distortion of the whole image while maintaining the perceptual quality is very challenging. To overcome this trade-off, we propose a simple yet effective encoder training scheme, coined IntereStyle, which facilitates encoding by focusing on the interest region. IntereStyle steers the encoder to disentangle the encodings of the interest and uninterest regions. To this end, we filter the information of the uninterest region iteratively to regulate the negative impact of the uninterest region. We demonstrate that IntereStyle achieves both lower distortion and higher perceptual quality compared to the existing state-of-the-art encoders. Especially, our model robustly conserves features of the original images, which shows the robust image editing and style mixing results. We will release our code with the pre-trained model after the review.
2022-09-22 07:03:28 - Memory-Augmented Graph Neural Networks: A Neuroscience Perspective - *Guixiang Ma, Vy Vo, Theodore Willke, Nesreen K. Ahmed* - `2209.10818v1` - [abs](http://arxiv.org/abs/2209.10818v1) - [pdf](http://arxiv.org/pdf/2209.10818v1) > Graph neural networks (GNNs) have been extensively used for many domains where data are represented as graphs, including social networks, recommender systems, biology, chemistry, etc. Recently, the expressive power of GNNs has drawn much interest. It has been shown that, despite the promising empirical results achieved by GNNs for many applications, there are some limitations in GNNs that hinder their performance for some tasks. For example, since GNNs update node features mainly based on local information, they have limited expressive power in capturing long-range dependencies among nodes in graphs. To address some of the limitations of GNNs, several recent works started to explore augmenting GNNs with memory for improving their expressive power in the relevant tasks. In this paper, we provide a comprehensive review of the existing literature of memory-augmented GNNs. We review these works through the lens of psychology and neuroscience, which has established multiple memory systems and mechanisms in biological brains. We propose a taxonomy of the memory GNN works, as well as a set of criteria for comparing the memory mechanisms. We also provide critical discussions on the limitations of these works. Finally, we discuss the challenges and future directions for this area.
2022-09-22 10:22:27 - Bootstrapping Multi-view Representations for Fake News Detection - *Qichao Ying, Xiaoxiao Hu, Yangming Zhou, Zhenxing Qian, Dan Zeng, Shiming Ge* - `2206.05741v3` - [abs](http://arxiv.org/abs/2206.05741v3) - [pdf](http://arxiv.org/pdf/2206.05741v3) > Previous researches on multimedia fake news detection include a series of complex feature extraction and fusion networks to gather useful information from the news. However, how cross-modal consistency relates to the fidelity of news and how features from different modalities affect the decision-making are still open questions. This paper presents a novel scheme of Bootstrapping Multi-view Representations (BMR) for fake news detection. Given a multi-modal news, we extract representations respectively from the views of the text, the image pattern and the image semantics. Improved Multi-gate Mixture-of-Expert networks (iMMoE) are proposed for feature refinement and fusion. Representations from each view are separately used to coarsely predict the fidelity of the whole news, and the multimodal representations are able to predict the cross-modal consistency. With the prediction scores, we reweigh each view of the representations and bootstrap them for fake news detection. Extensive experiments conducted on typical fake news detection datasets prove that the proposed BMR outperforms state-of-the-art schemes.
2022-09-22 12:03:33 - Implementing and Experimenting with Diffusion Models for Text-to-Image Generation - *Robin Zbinden* - `2209.10948v1` - [abs](http://arxiv.org/abs/2209.10948v1) - [pdf](http://arxiv.org/pdf/2209.10948v1) > Taking advantage of the many recent advances in deep learning, text-to-image generative models currently have the merit of attracting the general public attention. Two of these models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image. Based on a novel approach for image generation called diffusion models, text-to-image models enable the production of many different types of high resolution images, where human imagination is the only limit. However, these models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet. In addition, neither the codebase nor the models have been released. It consequently prevents the AI community from experimenting with these cutting-edge models, making the reproduction of their results complicated, if not impossible. In this thesis, we aim to contribute by firstly reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model. Highly based on DALL-E 2, we introduce several slight modifications to tackle the high computational cost induced. We thus have the opportunity to experiment in order to understand what these models are capable of, especially in a low resource regime. In particular, we provide additional and analyses deeper than the ones performed by the authors of DALL-E 2, including ablation studies. Besides, diffusion models use so-called guidance methods to help the generating process. We introduce a new guidance method which can be used in conjunction with other guidance methods to improve image quality. Finally, the images generated by our model are of reasonably good quality, without having to sustain the significant training costs of state-of-the-art text-to-image models.
2022-09-22 13:08:04 - Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling - *Thiemo Wambsganss, Vinitra Swamy, Roman Rietsche, Tanja Käser* - `2209.10335v2` - [abs](http://arxiv.org/abs/2209.10335v2) - [pdf](http://arxiv.org/pdf/2209.10335v2) > Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in different domains, they are limited in addressing fine-grained analysis on educational and multilingual corpora. In this work, we analyze bias across text and through multiple architectures on a corpus of 9,165 German peer-reviews collected from university students over five years. Notably, our corpus includes labels such as helpfulness, quality, and critical aspect ratings from the peer-review recipient as well as demographic attributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1) our collected corpus in connection with the clustered labels, (2) the most common pre-trained German language models (T5, BERT, and GPT-2) and GloVe embeddings, and (3) the language models after fine-tuning on our collected data-set. In contrast to our initial expectations, we found that our collected corpus does not reveal many biases in the co-occurrence analysis or in the GloVe embeddings. However, the pre-trained German language models find substantial conceptual, racial, and gender bias and have significant changes in bias across conceptual and racial axes during fine-tuning on the peer-review data. With our research, we aim to contribute to the fourth UN sustainability goal (quality education) with a novel dataset, an understanding of biases in natural language education data, and the potential harms of not counteracting biases in language models for educational tasks.

Content here

文档信息

Search

    Table of Contents