[{"content":"I\u0026rsquo;ve spent the last decade and a half working on search, recommendation, and — for the past few years — large language models. Along the way I\u0026rsquo;ve collected a lot of opinions, a lot of war stories, and a lot of half-finished notes that never made it out of my private scratchpads.\nThis site is where I\u0026rsquo;m going to start pushing some of those notes into the open. Expect posts on:\nLLM pretraining and post-training at scale — GRPO, DPO, FSDP/DeepSpeed, multinode training on H100/H200 clusters. Multimodal learning — vision-language models for search, retrieval, and grounded summarization. Search and ranking — query understanding, relevance, and how LLMs reshape classical IR stacks. Things I wish I\u0026rsquo;d known earlier — debugging distributed training, evaluating generative systems, managing ML teams. If any of this resonates, you can find my publications here, my projects here, or just email me (link in the sidebar).\nMore soon.\n","permalink":"https://xuewyang.github.io/posts/hello-world/","summary":"A short note about why I\u0026rsquo;m writing here and what to expect.","title":"Hello, world"},{"content":"A selected set of projects I\u0026rsquo;ve led over the years. Most are work done under NDA, so descriptions stay at a high level.\nCoupangGPT-VL — Vision-Language E-commerce Model 2024 — present · Coupang · Team Lead\nA unified vision-language foundation model for e-commerce. I initiated and led the multimodal redesign of Coupang\u0026rsquo;s search experience: queries and product metadata (titles, images, descriptions, OCR text) are encoded into a shared embedding space for multimodal retrieval.\nBuilt a multi-turn conversational ShopBot powered by CoupangGPT-VL for natural-language product discovery. Improved click-through rate (CTR) by 13.4% on the redesigned search surface. Enabled better semantic retrieval on hard queries (\u0026ldquo;breathable shoe cabinet\u0026rdquo; → items with ventilation and airflow attributes). Introduced grounded summaries that explain why a product was recommended. CoupangGPT — Scalable E-commerce Language Model 2023 — 2025 · Coupang · Team Lead\nLed end-to-end development of CoupangGPT, a multimodal reasoning GPT pretrained on 1.5 trillion tokens, images, and proprietary e-commerce data — powered by GRPO reinforcement learning and multimodal instruction tuning. Used across search relevance, search ranking, catalog, and ads quality.\nManaged a team of 5 ML engineers, a data scientist, and several product analysts. Partnered with infra to stand up Coupang\u0026rsquo;s first multinode training cluster — 1000+ A100 / H100 / H200 GPUs on p4 / p5 / p5en instances with EFA and InfiniBand. Delivered weekly progress updates to VP and Director-level stakeholders. Pretraining / post-training stack: PyTorch, FSDP, DeepSpeed, GRPO, DPO. Search Relevance with CoupangGPT 2023 — 2025 · Coupang · Team Lead\nReformulated search relevance as an instruction fine-tuning problem — does this (query, product title, metadata) tuple constitute an exact match? Drove EM accuracy from 0.55 → 0.99 through iterative distillation from larger teacher models.\nTimeline Model EM Accuracy Q4 2023 DCN (1.3M) on 0.2M pairs 0.55 Q1 2024 RoBERTa (123M) on 0.2M human labels 0.78 Q2 2024 RoBERTa distilled from CoupangGPT-8B 0.87 Q3 2024 RoBERTa distilled from CoupangGPT-27B 0.93 Q4 2024 RoBERTa distilled from CoupangGPT-27B 0.96 Q1 2025 RoBERTa + reasoning traces from 27B 0.97 Q2 2025 CoupangGPT-1B + reasoning (student) 0.99 Coordinated 100+ human annotators with PMs to nail down labeling guidelines. The EM lift translated into a measurable lift in purchase rate.\nQuery Understanding with CoupangGPT 2024 — 2025 · Coupang · Team Lead\nA multi-task query understanding pipeline — spelling correction, brand detection, query category prediction, query expansion — trained without any human labels for the bootstrap round.\nUsed Gemini-3-Pro via advanced prompt engineering to generate zero-human-label training data. Scaled via knowledge distillation: Gemini-3-Pro → CoupangGPT student. The distilled student matched or surpassed the teacher on internal evals. Shipped a 2–3% lift in PPC (purchases per customer). OGPT — On-Device Chatbot for OPPO Smartphones 2021 — 2023 · OPPO · Tech Lead\nA high-performance chatbot deployed on smartphones, serving 10M+ monthly users.\nApplied DPO for instruction-following alignment. Distributed training across multiple nodes with FSDP, Lightning, DeepSpeed. Integrated multiple LLMs with LangChain; advanced prompt engineering to extend utility across use cases. Semantic Search for OPPO Photo Album 2021 — 2023 · OPPO\nReplaced keyword matching on the smartphone Photo Album with a vision-language semantic search stack. Contrastive pretraining of text-image pairs, transformer-based query encoding, and nearest-neighbor retrieval delivered a 10% F1 improvement over the previous system.\nFashion Captioning \u0026amp; FACAD 2019 — 2020 · Stony Brook · PhD research\nReleased the FACAD dataset and a reinforcement-learning-with-semantic-rewards method for generating accurate fashion captions, published at ECCV 2020.\nGitHub — xuewyang/Fashion_Captioning Benchmark repo Open Source 2024 — present\nActive contributor to Axolotl and ms-swift — scalable post-training, instruction tuning, and RL training pipelines.\n","permalink":"https://xuewyang.github.io/projects/","summary":"Selected projects I\u0026rsquo;ve led or worked on.","title":"Projects"},{"content":"A selection of peer-reviewed publications. For a more complete list see my Google Scholar profile.\n2022 ReFormer: The Relational Transformer for Image Captioning\nXuewen Yang, Yingru Liu, Xin Wang\nProceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022.\n[semantic scholar]\n2021 Journalistic Guidelines Aware News Image Captioning\nXuewen Yang, Svebor Karaman, Joel Tetreault, Alejandro Jaimes\nProceedings of EMNLP, 2021.\n[arXiv] · [code]\n2020 Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards\nXuewen Yang, Heming Zhang, Di Jin, Yingru Liu, Chi-Hao Wu, Jianchao Tan, Dongliang Xie, Jue Wang, Xin Wang\nEuropean Conference on Computer Vision (ECCV), 2020.\nIntroduces the FACAD dataset for fashion image captioning.\n[springer] · [code + data]\nAdaptive Activation Network and Functional Regularization for Efficient and Flexible Deep Multi-Task Learning\nYingru Liu, Xuewen Yang, Dongliang Xie, Xin Wang, Li Shen, Haozhi Huang, Niranjan Balasubramanian\nAAAI Conference on Artificial Intelligence (AAAI), 2020.\n[AAAI]\nPaper Reviewing Conducted 200+ reviews for NeurIPS, ACL, EMNLP, CVPR, ECCV, ICCV, AAAI, IJCAI, TPAMI, Knowledge-Based Systems, and IEEE Transactions on Affective Computing.\nOpen Source Active contributor to Axolotl and ms-swift — submissions around scalable post-training, instruction tuning, and RL training pipelines.\n","permalink":"https://xuewyang.github.io/publications/","summary":"Selected publications and preprints.","title":"Publications"}]