当前位置:首页 > 文章中心 > 正文内容

爱可可AI论文推介(10月30日)

dgx6667个月前 (05-20)文章中心25


LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

(*表示值得重点关注)


1、[LG] *Algorithms for Causal Reasoning in Probability Trees

T Genewein, T McGrath, G Déletang, V Mikulik, M Martic, S Legg, P A. Ortega

[DeepMind]

概率树因果推理算法。提出了离散概率树中因果推理的具体算法,覆盖全部因果层次(关联、干预和反事实),并对任意命题和因果事件进行操作。该工作将因果推理的领域,扩展到非常通用的的离散随机过程。

Probability trees are one of the simplest models of causal generative processes. They possess clean semantics and -- unlike causal Bayesian networks -- they can represent context-specific causal dependencies, which are necessary for e.g. causal induction. Yet, they have received little attention from the AI and ML community. Here we present concrete algorithms for causal reasoning in discrete probability trees that cover the entire causal hierarchy (association, intervention, and counterfactuals), and operate on arbitrary propositional and causal events. Our work expands the domain of causal reasoning to a very general class of discrete stochastic processes.

https://weibo.com/1402400261/JrwuViYrR


2、[AS] *Attention is All You Need in Speech Separation

C Subakan, M Ravanelli, S Cornell, M Bronzi, J Zhong

[Mila-Quebec AI Institute & Universita Politecnica delle Marche & University of Rochester]

基于Transformer的语音分离架构SepFormer。提出一种新的语音分离神经网络模型SepFormer(Separation Transformer),一个非RNN的网络架构,采用由Transformer组成的屏蔽网络,掩蔽网络通过多尺度方法学习短期和长期依赖关系。与最新的基于RNN的系统相比,SepFormer的速度快得多,对内存的要求也少得多。

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism. In this paper, we propose the `SepFormer', a novel RNN-free Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model matches or overtakes the state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It indeed achieves an SI-SNRi of 20.2 dB on WSJ0-2mix matching the SOTA, and an SI-SNRi of 17.6 dB on WSJ0-3mix, a SOTA result. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest RNN-based systems.

https://weibo.com/1402400261/JrwygqcX8


3、[CL] *Pre-trained Summarization Distillation

S Shleifer, A M. Rush

[Hugging Face]

面向摘要任务的预训练Transformer模型蒸馏方法。从Seq2Seq转换器中去除精心选取的解码器层,继续微调,可快速生成高质量的学生模型,在某些情况下,使用相同初始化策略的更复杂的训练技术可以产生额外的质量改进。

Recent state-of-the-art approaches to summarization utilize large pre-trained Transformer models. Distilling these models to smaller student models has become critically important for practical use; however there are many different distillation methods proposed by the NLP literature. Recent work on distilling BERT for classification and regression tasks shows strong performance using direct knowledge distillation. Alternatively, machine translation practitioners distill using pseudo-labeling, where a small model is trained on the translations of a larger model. A third, simpler approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning. We compare these three approaches for distillation of Pegasus and BART, the current and former state of the art, pre-trained summarization models, and find that SFT outperforms knowledge distillation and pseudo-labeling on the CNN/DailyMail dataset, but under-performs pseudo-labeling on the more abstractive XSUM dataset. PyTorch Code and checkpoints of different sizes are available through Hugging Face transformers here this http URL.

https://weibo.com/1402400261/JrwCpsESr


4、[LG] Training Generative Adversarial Networks by Solving Ordinary Differential Equations

C Qin, Y Wu, J T Springenberg, A Brock, J Donahue, T P. Lillicrap, P Kohli

[DeepMind]

通过求解常微分方程训练GAN。该工作将当前机器学习研究的一个重要部分(生成模型)与一个古老的研究领域(动态系统集成)联系起来。假设GAN训练的不稳定性,是由连续动力学离散化的积分误差引起的,通过实验验证,当与控制积分误差的调节器结合时,著名的ODE求解器(如Runge-Kutta)能使GAN模型的训练更加稳定。

The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly stable. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training - when combined with a regulariser that controls the integration error. Our approach represents a radical departure from previous methods which typically use adaptive optimisation and stabilisation techniques that constrain the functional space (e.g. Spectral Normalisation). Evaluation on CIFAR-10 and ImageNet shows that our method outperforms several strong baselines, demonstrating its efficacy.

https://weibo.com/1402400261/JrwOFcuKc


5、[LG] Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity

S Chen, H He, W J. Su

[University of Pennsylvania]

标签感知神经切核。提出标签感知的概念,来解释和减少由NTK训练的模型与真实网络神经网络之间的性能差距。受通用标签感知Hoeffding分解的启发,提出了两个标签感知版本的NTK,通过理论研究和综合实验表明,用所提出核训练的模型,在泛化能力和局部弹性方面能更好地模拟神经网络的行为。

As a popular approach to modeling the dynamics of training overparametrized neural networks (NNs), the neural tangent kernels (NTK) are known to fall behind real-world NNs in generalization ability. This performance gap is in part due to the \textit{label agnostic} nature of the NTK, which renders the resulting kernel not as \textit{locally elastic} as NNs~\citep{he2019local}. In this paper, we introduce a novel approach from the perspective of \emph{label-awareness} to reduce this gap for the NTK. Specifically, we propose two label-aware kernels that are each a superimposition of a label-agnostic part and a hierarchy of label-aware parts with increasing complexity of label dependence, using the Hoeffding decomposition. Through both theoretical and empirical evidence, we show that the models trained with the proposed kernels better simulate NNs in terms of generalization ability and local elasticity.

https://weibo.com/1402400261/JrwSkocoX


[LG] Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability

增强可解释性及其统计影响:用简化的学习模型研究准确性(风险)和可解释性之间的权衡

G K Dziugaite, S Ben-David, D M. Roy

[Element AI & University of Waterloo & University of Toronto]

https://weibo.com/1402400261/JrwGVDuwv


[LG] The geometry of integration in text classification RNNs

文本分类RNN的整合几何

K Aitken, V V. Ramasesh, A Garg, Y Cao, D Sussillo, N Maheswaranathan

[University of Washington & Google]

https://weibo.com/1402400261/JrwY6vdA0


[LG] Generalized eigen, singular value, and partial least squares decompositions: The GSVD package

广义特征值、奇异值和偏最小二乘分解:GSVD包

D Beaton

[Baycrest Health Sciences]

https://weibo.com/1402400261/Jrx0xxcB7

扫描二维码推送至手机访问。

版权声明:本文由第六芝士网发布,如需转载请注明出处。

本文链接:http://www.dgx666.com/post/1725.html

分享给朋友:

“爱可可AI论文推介(10月30日)” 的相关文章

新华社、人民日报、央视官方微信点赞,“菏泽好邻居”让全国人民羡慕

齐鲁网·闪电新闻5月5日讯 俗话说“远亲不如近邻”。邻里间的互动,有时候可谓非常有趣。近日,山东菏泽“致楼上邻居的一封信”在网上走红。‘一位热心又专业的邻居老王,写信对邻居练钢琴,提出了一些“指导意见”,指出了两首曲子中的错误,还附上了自己精心准备的乐谱。这封“钢琴指导”信张贴后,引来众多邻居留言。...

公文印制版式标准

××××局文件(空 6 行,文号在第 7 行,文号下沿与红线距离 4mm,年份括号为六角括号)字〔2021〕X 号(抬头空两行)关于规范公文印制版式标准的通知(空一行) (正文标题方正小标宋简体 2 号,34 磅)局机关各科室:一、排版标准(一)版式1.页眉 1.5,页脚 2.3;页边距上 3.7,...

2014CAD激活后闪退,解决方法

之前在电脑上安装了一个CAD软件,是2014版的,平时用的很少,用的时候打开用用。昨天有个文件需要用CAD软件打开,提示需要激活,然后我在网上下载了一个激活工具,按照操作步骤,最终是激活成功了。我再次打开软件的时候,快到打开界面的时候直接退出了,试了好几次,都是这种情况,我搜索了一下,还真有人也遇到...

福建新画卷,把福建成绩“画”给你看

·<animate attributeName="opacity" additive="replace" begin="click" dur="1" calcMode="linear" values="0;1" fill="freeze" restart="never"/><an...

office 2010安装及破解方法

2010版,相信很多人都不会感到陌生在校园,学校上课的计算机教室的电脑里几乎都还是在用2010版本的office软件,甚至还有用2007版本的,2010版本比较稳定好用,所以大家比较喜欢这个版本但是在网上下载下来的软件一安装就会附带很多垃圾流氓软件我们会感到很烦躁,所以今天给大家带来一些比较老的好用...

用GO+webview做个桌面程序

前两天我讲了bun+webview,今天我讲一下GO+webview。GO语言有好多第三方的webview包,我个人感觉好用的是github.com/webview/webview_go安装包:go get github.com/webview/webview_go我下面的例子包含了web框架和wi...