π’ I am currently on the job market. I'd be happy to connect!
I am a Postdoctoral Research Fellow at the National University of Singapore, working with Jialin Li. I also closely collaborate with Irene Zhang and Dan Ports at Microsoft Research.
My research interests lie at the intersection of distributed systems, datacenter networks, dataplane operating systems, and machine learning systems. Currently, I am focusing on designing high-performance systems that leverage programmable hardware and ML-driven optimization to address challenges in modern datacenters.
Education
National University of Singapore
Singapore
Ph.D. in Computer Science (Advisor: Jialin Li)
Aug 2019 β Jan 2026
Seoul, South Korea
Bachelor's Degree in Computer Science
Mar 2012 β Aug 2019
Uppsala, Sweden
Exchange Student in Information Technology
Aug 2017 - Jan 2018
Work in Progress (projects that I’m currently leading)
Performance tuning remains a persistent challenge in modern datacenters, especially at microsecond scales. We are exploring new dataplane OS optimization techniques with two complementary directions:
(1) ML-driven optimization. We natively integrate machine learning into the OS dataplane to support dynamic parameter optimization at runtime [Details].
(2) Intra-host network-aware optimization. I am conducting comprehensive analysis on intra-host network spanning across hardware-level signals (e.g., PCIe, memory access patterns, cache efficiency, DDIO, etc.), inter-device data exchange, device drivers, OS dataplane. We aim to extend this to GPU hosts, where PCIe/NVLink interconnects and GPU memory hierarchies introduce additional complexity critical for ML workloads [Details].
Layer-4 load balancers are a popular approach to preventing high tail latencies, but existing solutions have fundamental limitations under unpredictable workloads and traffic bursts due to their static assignment of connections to servers. We propose the first dynamic L4 load balancer with μs-scale stateful connection migration. Our system leverages two trends β programmable switches and kernel-bypass β to efficiently implement TCP migration without packet loss, while maintaining transparency to clients.
Publications
[APSYS ‘25]
ML-native Dataplane Operating Systems
Inho Choi, Anand Bonde, Jing Liu, Joshua Fried, Irene Zhang, Jialin Li.
16th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2025)
[ArXiv]
A Primer on RecoNIC: RDMA-enabled Compute Offloading on SmartNIC
Guanwen Zhong, Aditya Kolekar, Burin Amornpaisannon, Inho Choi, Haris Javaid, Mario Baldi.
ArXiv, Dec, 2023
Summary
The main goal of this project was to bring network data as close to computation as possible. For example, our server had a FPGA-based SmartNIC device to accelerate some computation tasks. By default, data received from a remote peer through network is first stored in the host's memory, and then copied to the FPGA memory on the NIC through the PCIe bus. And once the computation tasks are completed, results are copied back to the host's memory, and then sent to the remote peer. This CPU-centric solution introduces multiple data copies, leading to a significant inefficiency.
To address this problem, we developed a new platform by co-designing the SmartNIC HW shell and host SW stacks. In the HW shell, we integrated an RDMA offload engine and FPGA-based programmable compute logic modules. Developers have the flexibility to design their accelerators within the compute logic modules. Then, network data from remote peers via RDMA can be stored into either the FPGA memory for accelerated computation or host memory, and also the RDMA engine can directly perform RDMA operations to access host memory of remote peers. The SW stack includes user-space library and kernel-level device drivers to support host's access to device memory and configure device registers from host application.
My main task was verifying the overall workflow and testing an example use case of accelerating matrix multiplication using the framework. For example, we store two matrix arrays on a remote machine, and the RecoNIC host performs RDMA READ to those arrays. Then, those arrays are directly stored into the device memory. Once detecting the readiness of the two arrays, the host issues a compute control command to the Compute Logic to start computation. Once the computation is finished, the host reads the result back to the host memory for verification.
[APSYS ‘23]
Capybara: ΞΌSecond-scale live TCP migration
Inho Choi, Nimish Wadekar, Raj Joshi, Joshua Fried, Dan R. K. Ports, Irene Zhang, Jialin Li.
14th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2023)
[SIGCOMM ‘23]
Network Load Balancing with In-network Reordering Support for RDMA
Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan.
Proceedings of the 2023 ACM SIGCOMM Conference
Summary
ConWeave is a new network load balancing system for RDMA (RoCEv2) traffic. In datacenters, RoCEv2 is widely used to implement RDMA communication on Ethernet networks, but it still inherits many of the design assumptions of InfiniBand networks, like lossless and in-order packet delivery. So, when an RNIC receives a packet out-of-order, it treats it as an indication of packet loss due to network congestion and initiates loss recovery which results in the sending RNIC decreasing its sending rate. However, existing load balancing mechanisms in datacenters, like per-packet or per-flowlet load balancing, often cause out-of-order packets, so it's hard to be utilized for RDMA.
So we designed a new load balancing system for RDMA. The main idea is in-network packet reordering. The reason that we do the re-ordering in the network instead of the destination host is because RDMA bypasses host OS and it's hard to modify the behavior of commodity RDMA HW. So, we leverage programmable network switches for packet reordering. Recent programmable switches like Intel Tofino2 allow storing packets in the network with multiple FIFO queues and queue pause/resume features. ConWeave is a ToR-to-ToR mechanism. It makes re-routing (load-balancing) decisions at the source ToR, and re-orders packets at the destination ToR. The source ToR periodically monitors RTT of each flow. If the RTT exceeds a threshold, it treats the existing route as congested and triggers rerouting. Then, it tags a TAIL flag to the last packet going to the current path, and tags REROUTED to subsequent packets going to the new path. If a REROUTED packet arrives at the DstToR earlier than the TAIL packet, the DstToR dynamically assigns a reorder queue and buffers the packet. After DstToR receives the TAIL packet, it flushes the reorder queue and sends a CLEAR signal to the SrcToR, meaning that there are no out-of-order packets and it can start a new re-routing epoch.
[NSDI ‘23]
Hydra: Serialization-Free Network Ordering for Strongly Consistent Distributed Applications
Inho Choi, Ellis Michael, Yunfan Li, Dan Ports, and Jialin Li.
Proceedings of the 20th USENIX Conference on Network Systems Design and Implementation
[S&P ‘20]
A Stealthier Partitioning Attack against Bitcoin Peer-to-Peer Network
Muoi Tran, Inho Choi, Gi Jun Moon, Viet-Anh Vu, and Min Suk Kang.
In Proceedings of IEEE Symposium on Security and Privacy, May 2020.
[UbiComp Workshop ‘17]
Multimodal Data Collection Framework for Mental Stress Monitoring
Saewon Kye, Junhyung Moon, Juneil Lee, Inho Choi, Dongmi Cheon, and Kyoungwoo Lee.
In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers. (Workshop Paper)
Experiences
National University of Singapore - Postdoctoral Research Fellow
Singapore
School of Computing (Advisor: Jialin Li)
Feb 2026 β Present
Microsoft Research - PhD Research Intern
Redmond, WA, USA
Systems Research Group (Mentor: Irene Zhang)
June 2025 β Sep 2025
Conducted research on ML-native dataplane OS architecture for automatic parameter tuning to address performance optimization challenges in modern datacenters at microsecond scales. Designed system architecture integrating machine learning natively into the OS dataplane for dynamic parameter optimization. Developed and evaluated the approach to demonstrate benefits of automated tuning across diverse workloads. Published initial findings at APSys '25, with full-system development currently in progress.
Microsoft Research - PhD Research Intern
Redmond, WA, USA
Systems Research Group (Mentor: Dan Ports)
June 2024 β Aug 2024
Conducted research on dynamic Layer-4 load balancing to address tail latency issues under unpredictable workloads and traffic bursts due to static TCP connection assignment between client and server. Designed and implemented a new dynamic load balancing solution with microsecond-scale TCP connection migration, leveraging programmable switches and kernel-bypass technologies. Published initial findings at APSys '23, with full paper currently under submission.
AMD - PhD Research Intern
Singapore
Xilinx - FPGA / System Design Lab (Mentor: Guanwen Zhong)
May 2023 β Aug 2023
Contributed to research on hardware accelerator architectures for datacenter networks, specializing in RDMA protocol optimization and computation offloading on FPGA-based SmartNIC platforms. Participated in implementing and validating solutions through FPGA prototype and performance evaluation, with findings published on ArXiv.
National University of Singapore - Research Intern
Singapore
Systems & Network Security Lab (Advisor: Min Suk Kang)
Sep 2018 β Feb 2019
Contributed to research on network-level security vulnerabilities in Bitcoin's peer-to-peer network. Participated in demonstrating partitioning attacks on Bitcoin's peer-to-peer network and defense mechanisms. Co-authored a paper published at IEEE S&P '20, advancing cryptocurrency network security.
Metlife - Summer Intern
Seoul, Korea
IT Planning Team
July 2018 β Aug 2018
Analyzed IT infrastructure and database server architecture at MetLife Korea, investigating OAuth 2.0 protocol adoption for security enhancement. Presented findings through internal seminar.
Yonsei University - Research Intern
Seoul, Korea
Dependable Computing Lab (Advisor: Kyoungwoo Lee)
Feb 2017 β May 2018
Contributed to development of a multimodal stress monitoring framework for analyzing people's physiological and behavioral reactions to stressors. Participated in designing and conducting experiments and implementing a real-time signal processing framework. Co-authored paper published at UbiComp Workshop '17, presenting the framework and experimental findings.
Awards
National University of Singapore
Oct 2023
National University of Singapore
Jan 2023
National University of Singapore
Aug 2019 β Jul 2023
Yonsei University
Aug 2018
Mentoring Experiences
-
Yiyang Liu — NUS, Singapore (2024)
[Master's Thesis] Enhancing Distributed Systems with Hydra: A Software Solution for Scalable Network Ordering