Denoising-Oriented Deep Hierarchical Reinforcement Learning for Next-basket Recommendation

Slide Note
Embed
Share

This research paper presents a novel approach, HRL4Ba, for Next-basket Recommendation (NBR) by addressing the challenge of guiding recommendations based on historical baskets that may contain noise products. The proposed Hierarchical Reinforcement Learning framework incorporates dynamic context modeling and reward function optimization to enhance recommendation accuracy and relevance. By identifying relevant products and maximizing utility through denoising, this method aims to improve user experience in basket recommendations.


Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation Qihan Du, Li Yu*, Huiyuan Li, Youfang Leng and Ningrui Ou Renmin University of China, School of Information duqihan@ruc.edu.cn

  2. Outline Introduction 1 Method: HRL4Ba 2 Experimental Results 3 Conclusion 4 Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 1/14 Qihan Du May 11, 2022 ICASSP 2022

  3. Introduction What is next-basket recommendation (NBR)? NBR aims to recommend users a basket of items on the next visit by considering the sequence of their historical baskets. Historical baskets sequence Recommended basket [1] [1] Y.Q. Qin, P.F. Wang, and C.L. Li, The world is binary: Contrastive learning for denoising next basket recommendation, SIGIR, 2021, pp. 859 868 Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 2/14 Qihan Du May 11, 2022 ICASSP 2022

  4. Introduction Main problem. A historical basket may contain noise products that are irrelevant to the user s next choice. Our idea. Intuitively, it is a critical issue for NBR to identify those relevant products to guide recommendation. relevant Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 3/14 Qihan Du May 11, 2022 ICASSP 2022

  5. Introduction New challenge of next-basket recommendation. The relevance of the product depends on the target user s dynamic context, rather than on a fixed policy. How to model the context precisely? (1) Fixed policy modeling (2) Dynamic context modeling relevant or not? relevant or not? context: Halloween Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 4/14 Qihan Du May 11, 2022 ICASSP 2022

  6. Introduction Main contribution. We propose a Hierarchical Reinforcement Learning framework for Basket Recommendation (called HRL4Ba) to do dynamic basket denoising. We model the hierarchical dynamic context from the basket-level and the item-level jointly. Our optimization object is: the maximum increment utility of basket denoising, and we carefully design the reward function to satisfy this object. Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 5/14 Qihan Du May 11, 2022 ICASSP 2022

  7. Method: HRL4Ba Problem Formulation. User set ? = {?1, ,??}, Item set ? = {?1, ,??}. ?, ,?? ?,?2 ?} ? is For each user ? historical basket sequence: ? = {?1 ,??}, where ??= {?1 the basket she purchased at time ? and ? is the size of the basket, As a sequential recommendation task: our goal is to recommend the basket ??+1to her at time ? + 1. Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 6/14 Qihan Du May 11, 2022 ICASSP 2022

  8. Method: HRL4Ba Model Overview. The high/low-level agent observes the inter-/intra-basket context (state) and decides to mask or retain a basket /item (action); The environment selects candidates and calculates the recommendation probability (reward) of the target product ?? before and after denoising, i.e., ?????? = ? ???????? ?(??|?????). Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 7/14 Qihan Du May 11, 2022 ICASSP 2022

  9. Method: HRL4Ba High-level Agent (Inter-basket context modeling for basket-level denoising). High-level State encoder. Get the state vector from the inter-basket context and the target item. = ??? ?1,..,?? ? ?? = ????? ???????( ? ,??) = 1 means High-level Actor. Decide which baskets are containing noise for the target item, ?? = 0 otherwise. that the basket ??contains noise, ?? = ????? ?? ,?? = {0,1} ?? High-level Critic. Estimate the Q-values of the state-action pair. ? ?? ,?? = ?????? ???????(?? ,?? ) Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 8/14 Qihan Du May 11, 2022 ICASSP 2022

  10. Method: HRL4Ba Low-level Agent (Intra-basket context modeling for item-level denoising). ?, ,?? = 1, it means that the basket ??= {?1 ?,?2 ?} contains noise, the low-level agent is required If ?? to decide which items are noise and mask them. Low-level State encoder. Get the state vector from the intra-basket context and the target item. ?, ,?? ?,?2 ?= ??? ?1 ? ? ?= ????? ???????( ? ?,??) ?? ?as noise. ?? ?= 1 means yes, ?? ?= 0 otherwise. Low-level Actor. Decide whether to mask ?? ?= ????? ?? ? ?? Low-level Critic. Estimate the expected return of the state-action pair. ???? ?,?? ? ?,?? ?) = ?????? ???????(?? Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 9/14 Qihan Du May 11, 2022 ICASSP 2022

  11. Method: HRL4Ba Environment (Recommender). Recommender. Recalls top- ?(? ?) items with highest score as the candidates, and pick up one by one as the target item ??, i.e., ??????= ?(?? ??), where ??means the user s current preference extracted by an attentive aggregator. ??= ?????????( ????????? ???????) Reward function. The positive increment utility on the recommended probability of the target item ??before and after denoising. ? ?,? = ? ??????? ? ???????? = ? ?? ?? ? ?? ?? Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 10/14 Qihan Du May 11, 2022 ICASSP 2022

  12. Experimental Results Dataset. Tafeng and Instacart. Evaluation Metrics. F1@K , NDCG@K. Experimental Design. Overall Comparison RQ1: Does the RL-based denoising module work ? RQ2: Does the hierarchical structure for denoising helpful ? RQ3: What is the effect of the number of candidate ? ? Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 11/14 Qihan Du May 11, 2022 ICASSP 2022

  13. Experimental Results Overall Comparison. Baselines: traditional methods (POP, FPMC, DREAM) & attention methods (ANAM, IntNet) & denoising methods (Beacon, CLEA). From Table 2, we find that HRL4Ba outperforms the baselines on all datasets and metrics. RQ1: Does the RL-based denoising module work ? A variant without basket denoising module: R4Ba From Table 2, we find the performance of R4Ba decreases awfully, i.e., the SOTA performance mainly benefited from the denoising module. Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 12/14 Qihan Du May 11, 2022 ICASSP 2022

  14. Experimental Results RQ2: Does the hierarchical structure for denoising helpful ? Two variant H4Ba/L4Ba with only high/low-level agent. They are mutually restricted to incomplete contexts without hierarchy and thus cannot achieve the SOTA performance. RQ3: What is the effect of the number of candidate ? ? We set ?/? = [10,50,100,500,1000]. In Figure 2, the perceptual field of HRL4Ba increases with ?, and the performance improves. However, an oversized ? will harm the performance. The general paradigm is ? = Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation ? 80? . 13/14 Qihan Du May 11, 2022 ICASSP 2022

  15. Conclusions A Challenge: A challenge about user s dynamic hierarchical context modeling for basket denoising An Object: Maximize the incremental recommendation probability before and after basket denoising. A Framework: A hierarchical reinforcement learning-based basket denoising framework for better next-basket recommendation. Denoising-Oriented Deep Hierarchical Reinforcement Learning For Next-basket Recommendation 14/14 Qihan Du May 11, 2022 ICASSP 2022

  16. THANKS 2022.05.11

Related


More Related Content