SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets

Shenghua Wan, Ziyuan Chen, Le Gan, Shuai Feng, De-Chuan Zhan
2/12/2026

Abstract

Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the distribution shift issue in offline RL, existing model-based methods heavily rely on the uncertainty of learned dynamics. However, the model uncertainty estimation becomes significantly biased when observations contain complex distractors with non-trivial dynamics. To address this challenge, we propose a new approach - \emph{Separated Model-based Offline Policy Optimization} (SeMOPO) - decomposing latent states into endogenous and exogenous parts via conservative sampling and estimating model uncertainty on the endogenous states only. We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO. To assess the efficacy, we construct the Low-Quality Vision Deep Data-Driven Datasets for RL (LQV-D4RL), where the data are collected by non-expert policy and the observations include moving distractors. Experimental results show that our method substantially outperforms all baseline methods, and further analytical experiments validate the critical designs in our method. The project website is \href{https://sites.google.com/view/semopo}{https://sites.google.com/view/semopo}.

View on arXivView PDF

Code Implementations(10)

Refine high-quality datasets and visual AI models

10,501731Apr 22, 20203 weeks agoApache-2.0
active-learningartificial-intelligencecomputer-visiondata-centric-aidata-cleaning+12 more

ExerciseDB API is an fitness exercise database api that allows users to access high-quality exercises data which consists 11000+ exercises. This API offers extensive information on each exercise, including target body parts, equipment needed, video, gifs, images for visual guidance, and step-by-step instructions.

16950Nov 25, 20254 months agoAGPL-3.0
datasetsexercise-database-apiexercise-datasetexercisedb-apiexercises+14 more

Facestar dataset. High quality audio-visual recordings of human conversational speech.

1107Mar 24, 20224 years agoNOASSERTION

This project conducts a comprehensive analysis of Tesla's stock price trends using various statistical and analytical techniques. The dataset consists of historical Tesla stock data, including opening price, closing price, high, low, trading volume, and date information. The analysis involves descriptive statistics, hypothesis testing, probability

10Mar 16, 20251 years ago

When told to understand the shape of a new object, the most instinctual approach is to pick it up and inspect it with your hand and eyes in tandem. Here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored. In this paper, we study this problem and present an effective chart-based approach to fusing vision and touch, which leverages advances in graph convolutional networks. To do so, we introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects. Our results show that (1) leveraging both vision and touch signals consistently improves single-modality baselines, especially when the object is occluded by the hand touching it; (2) our approach outperforms alternative modality fusion methods and strongly benefits from the proposed chart-based structure; (3) reconstruction quality boosts with the number of grasps provided; and (4) the touch information not only enhances the reconstruction at the touch site but also extrapolates to its local neighborhood.

7413Oct 14, 20204 years agoNOASSERTION

In this project, we're going to be putting what we've learned into practice. More specificially, we're going to be replicating the deep learning model behind the 2017 paper [*PubMed 200k RCT: a Dataset for Sequenctial Sentence Classification in Medical Abstracts*](https://arxiv.org/abs/1710.06071). When it was released, the paper presented a new dataset called PubMed 200k RCT which consists of ~200,000 labelled Randomized Controlled Trial (RCT) abstracts. The goal of the dataset was to explore the ability for NLP models to classify sentences which appear in sequential order. In other words, given the abstract of a RCT, what role does each sentence serve in the abstract? ![Skimlit example inputs and outputs](https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/09-skimlit-overview-input-and-output.png) *Example inputs ([harder to read abstract from PubMed](https://pubmed.ncbi.nlm.nih.gov/28942748/)) and outputs ([easier to read abstract](https://pubmed.ncbi.nlm.nih.gov/32537182/)) of the model we're going to build. The model will take an abstract wall of text and predict the section label each sentence should have.* ### Model Input For example, can we train an NLP model which takes the following input (note: the following sample has had all numerical symbols replaced with "@"): > To investigate the efficacy of @ weeks of daily low-dose oral prednisolone in improving pain , mobility , and systemic low-grade inflammation in the short term and whether the effect would be sustained at @ weeks in older adults with moderate to severe knee osteoarthritis ( OA ). A total of @ patients with primary knee OA were randomized @:@ ; @ received @ mg/day of prednisolone and @ received placebo for @ weeks. Outcome measures included pain reduction and improvement in function scores and systemic inflammation markers. Pain was assessed using the visual analog pain scale ( @-@ mm ). Secondary outcome measures included the Western Ontario and McMaster Universities Osteoarthritis Index scores , patient global assessment ( PGA ) of the severity of knee OA , and @-min walk distance ( @MWD )., Serum levels of interleukin @ ( IL-@ ) , IL-@ , tumor necrosis factor ( TNF ) - , and high-sensitivity C-reactive protein ( hsCRP ) were measured. There was a clinically relevant reduction in the intervention group compared to the placebo group for knee pain , physical function , PGA , and @MWD at @ weeks. The mean difference between treatment arms ( @ % CI ) was @ ( @-@ @ ) , p < @ ; @ ( @-@ @ ) , p < @ ; @ ( @-@ @ ) , p < @ ; and @ ( @-@ @ ) , p < @ , respectively. Further , there was a clinically relevant reduction in the serum levels of IL-@ , IL-@ , TNF - , and hsCRP at @ weeks in the intervention group when compared to the placebo group. These differences remained significant at @ weeks. The Outcome Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society International responder rate was @ % in the intervention group and @ % in the placebo group ( p < @ ). Low-dose oral prednisolone had both a short-term and a longer sustained effect resulting in less knee pain , better physical function , and attenuation of systemic inflammation in older patients with knee OA ( ClinicalTrials.gov identifier NCT@ ). ### Model output And returns the following output: ``` ['###24293578\n', 'OBJECTIVE\tTo investigate the efficacy of @ weeks of daily low-dose oral prednisolone in improving pain , mobility , and systemic low-grade inflammation in the short term and whether the effect would be sustained at @ weeks in older adults with moderate to severe knee osteoarthritis ( OA ) .\n', 'METHODS\tA total of @ patients with primary knee OA were randomized @:@ ; @ received @ mg/day of prednisolone and @ received placebo for @ weeks .\n', 'METHODS\tOutcome measures included pain reduction and improvement in function scores and systemic inflammation markers .\n', 'METHODS\tPain was assessed using the visual analog pain scale ( @-@ mm ) .\n', 'METHODS\tSecondary outcome measures included the Western Ontario and McMaster Universities Osteoarthritis Index scores , patient global assessment ( PGA ) of the severity of knee OA , and @-min walk distance ( @MWD ) .\n', 'METHODS\tSerum levels of interleukin @ ( IL-@ ) , IL-@ , tumor necrosis factor ( TNF ) - , and high-sensitivity C-reactive protein ( hsCRP ) were measured .\n', 'RESULTS\tThere was a clinically relevant reduction in the intervention group compared to the placebo group for knee pain , physical function , PGA , and @MWD at @ weeks .\n', 'RESULTS\tThe mean difference between treatment arms ( @ % CI ) was @ ( @-@ @ ) , p < @ ; @ ( @-@ @ ) , p < @ ; @ ( @-@ @ ) , p < @ ; and @ ( @-@ @ ) , p < @ , respectively .\n', 'RESULTS\tFurther , there was a clinically relevant reduction in the serum levels of IL-@ , IL-@ , TNF - , and hsCRP at @ weeks in the intervention group when compared to the placebo group .\n', 'RESULTS\tThese differences remained significant at @ weeks .\n', 'RESULTS\tThe Outcome Measures in Rheumatology Clinical Trials-Osteoarthritis Research Society International responder rate was @ % in the intervention group and @ % in the placebo group ( p < @ ) .\n', 'CONCLUSIONS\tLow-dose oral prednisolone had both a short-term and a longer sustained effect resulting in less knee pain , better physical function , and attenuation of systemic inflammation in older patients with knee OA ( ClinicalTrials.gov identifier NCT@ ) .\n', '\n'] ```

31Aug 4, 20223 years ago

High resolution (HR) is desirable in many areas, in particular, in medical imaging. Since such images are an important method to find certain diseases, the high resolution should improve the success rate of correct diagnosis. There are many super-resolution techniques which show their potential in achieving super-resolution images which attempt to reach the quality of HR images. This report provides the detailed overview of most important research articles starting from such traditional methods as interpolation to deep neural networks which have achieved great success in single image super-resolution (SISR). As the idea of skip connections promotes better performance, UNet was implemented and trained with pixel-wise losses (\textit{l1} loss, \textit{l2} loss, and \textit{Smooth L1-loss} ) on SISR dataset of ultrasound medical images collected for this research as a baseline solution to SISR. In order to overcome some limitations as simple, one-step fusion in UNet, as well as to improve visual SISR results, we propose some modifications to this baseline as a further direction of our research - SISR with the Generative Adversarial Network, and Deep Layer Aggregation.

70Sep 4, 20214 years ago

This chart visualizes the distribution of sports in which male and female participants are involved, based on the dataset provided. It highlights the relative frequency and popularity of each sport category, enabling quick identification of high- and low-participation areas.

10Aug 19, 20257 months ago

To visualisation the relations of Superstore dataset. In this dataset there are number of attributes to visualize such attributes I used Bar chart, Line Chart, Scatter plot and other stuff to represent as a meaningful and easily understanding how the Sales and Profits are related and in which region are there high low etc.,

00Jun 24, 20259 months ago
visual-reporting

Data visualization for a number of datasets ranging from dice rolls and random walks to local high-low temperatures, earthquakes worldwide and most popular GitHub Python repositories; based on project 2 from 'Python Crash Course' book by Eric Matthes

00Sep 2, 20241 years ago

Discussion