Policy generation from latent embeddings for reinforcement learning

Artaud, Corentin; Moreira-Pina, Rafael; Shi, Xiyu; De-Silva, Varuna

File(s) under embargo

Reason: Publisher requirement.

4

month(s)

9

day(s)

until file(s) become available

Policy generation from latent embeddings for reinforcement learning

conference contribution

posted on 2023-06-16, 15:15 authored by Corentin ArtaudCorentin Artaud, Rafael Moreira-Pina, Xiyu ShiXiyu Shi, Varuna De-SilvaVaruna De-Silva

The human brain endows us with extraordinary capabilities that enable us to create, imagine, and generate anything we desire. Specifically, we have fascinating imaginative skills allowing us to generate fundamental knowledge from abstract concepts. Motivated by these traits, numerous areas of machine learning, notably unsupervised learning and reinforcement learning, have started using such ideas at their core. Nevertheless, these methods do not come without fault. A fundamental issue with reinforcement learning especially now when used with neural networks as function approximators is their limited achievable optimality compared to its uses from tabula rasa. Due to the nature of learning with neural networks, the behaviours achievable for each task are inconsistent and providing a unified approach that enables such optimal policies to exist within a parameter space would facilitate both the learning procedure and the behaviour outcomes. Consequently, we are interested in discovering whether reinforcement learning can be facilitated with unsupervised learning methods in a manner to alleviate this downfall. This work aims to provide an analysis of the feasibility of using generative models to extract learnt reinforcement learning policies (i.e. model parameters) with the intention of conditionally sampling the learnt policy-latent space to generate new policies. We demonstrate that under the current proposed architecture, these models are able to recreate policies on simple tasks whereas fail on more complex ones. We therefore provide a critical analysis of these failures and discuss further improvements which would aid the proliferation of this work.

History

School

Loughborough University London

Published in

Intelligent Systems and Pattern Recognition

Volume

1941

Pages

155–168

Source

The International Conference on Intelligent Systems & Pattern Recognition

Publisher

Springer

Version

AM (Accepted Manuscript)

Rights holder

Publisher statement

This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-46338-9_12. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms

Publication date

2023-11-05

Copyright date

2024

DOI

https://doi.org/10.1007/978-3-031-46338-9_12

ISBN

9783031463372; 9783031463389

Publisher version

https://doi.org/10.1007/978-3-031-46338-9_12

Book series

Communications in Computer and Information Science

Language

en

Editor(s)

Akram Bennour; Ahmed Bouridane; Lotfi Chaari

Location

Hammamet, Tunisia

Event dates

11th May 2023 - 13th May 2023

Depositor

Dr Xiyu Shi. Deposit date: 15 June 2023

Usage metrics

Keywords

Reinforcement Learning Policy Modeling Deep Generative Models

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under embargo

4

9

Policy generation from latent embeddings for reinforcement learning

History

School

Published in

Volume

Pages

Source

Publisher

Version

Rights holder

Publisher statement

Publication date

Copyright date

DOI

ISBN

Publisher version

Book series

Language

Editor(s)

Location

Event dates

Depositor

Usage metrics

Categories

Keywords

Licence

Exports