<aside>
🧠Hi there! Chi Xing is a M.Sc student in AI at University of Edinburgh. His research interests lie in the intersection of Machine Learning and Distributed Computer System, with a focus on developing serverless generative model service and exploring their theoretical and practical aspects. He has a solid background in both research and engineering, having obtained a B.Sc degree in CS with 1st class hosnors from University of Liverpool.
</aside>

🎓 Education
M.Sc
Artificial Intelligence - Informatics@University of Edinburgh | 2024
- Focused on various machine learning frameworks, ranging from basic neural networks (RNN, CNN, MLP, etc.) to advanced modern architectures (Transformers, Diffusion Models, Large Multi-Modality Models, etc.)
- Dissertation is working on accelerating and serverless-supported preference alignment techniques (such as LoRA fine-tuning, RLHF, DPO and SFT, etc.). This project is supervised by Prof. Luo Mai.
B.Sc
Computer Science - University of Liverpool & XJTLU | 2020
- Research Interest Points: Algorithm Design, C++/C/C#, Optimisation, Machine Learning, AI Safety, Java, Web Development.
- Dissertation is focused on exploring various scheduling algorithms for modern smart grid. This project is supervised by Prof. Prudence Wong.
🚀 Experience
Core Contributor, Reviewer - ServerlessLLM | November 2024 - Present
- Proficient in building large-scale distributed inference systems using Hugging Face Transformers and vLLM.
- Designed and implemented an end-to-end serverless PEFT LoRA fine-tuning solution within the ServerlessLLM ecosystem to provide on-demand, cost-effective model customization services (PR #251, #189).
- Developed a multi-tenant serverless serving solution for LoRA adapters using Ray, achieving up to a 4.4x faster loading speed compared to the safetensors format by leveraging a multi-tier checkpoint loading mechanism (PR #248, #221).
Core Contributor, OSPP Mentor - Casibase | January 2024 - Present
- Enhanced the platform's core multi-modal capabilities: Deeply integrated various large multi-modal models to enable end-to-end functionalities for image understanding, generation, and mixed-media dialogue. Optimized user experience with features like drag-and-drop uploads and URL parsing (#925, #895, #717, #716).
- Expanded and optimized LM support: Integrated multiple industry-leading models and engineered a model provider multiplexing mechanism, allowing the system to dynamically select models based on load and cost (#785, #783, #703).
- Improved the core RAG workflow: Significantly boosted the quality of knowledge base vectorization and retrieval relevance by designing novel text-splitting strategies (#778, #727).
- Led full-stack development and performance optimization: Utilized Go (BeeGo) and React.js to independently deliver features including real-time billing & usage statistics (#898, #735), rich text rendering (LaTeX, code highlighting) (#775, #776), and front-end optimizations that enhanced message rendering speed and system stability (#777, #954).
LLM Researche Intern - N8 CIR | June 2024 - September 2024