Inho Choi

Postdoctoral Research Fellow National University of Singapore, School of Computing

inho.choi.12@gmail.com

📢 I am currently on the job market. I'd be happy to connect!

I am a Postdoctoral Research Fellow at the National University of Singapore, working with Jialin Li. I also closely collaborate with Irene Zhang and Dan Ports at Microsoft Research.

My research interests lie at the intersection of distributed systems, datacenter networks, dataplane operating systems, and machine learning systems. Currently, I am focusing on designing high-performance systems that leverage programmable hardware and ML-driven optimization to address challenges in modern datacenters.

Education

National University of Singapore

Singapore

Ph.D. in Computer Science (Advisor: Jialin Li)

Aug 2019 — Jan 2026

Yonsei University

Seoul, South Korea

Bachelor's Degree in Computer Science

Mar 2012 — Aug 2019

Uppsala University

Uppsala, Sweden

Exchange Student in Information Technology

Aug 2017 - Jan 2018

Work in Progress (projects that I'm currently leading)

ML for Systems: Dataplane OS Performance Optimization [Current Main Project]

Performance tuning remains a persistent challenge in modern datacenters, especially at microsecond scales. We are exploring new dataplane OS optimization techniques with two complementary directions:

(1) ML-driven optimization. We natively integrate machine learning into the OS dataplane to support dynamic parameter optimization at runtime [Details].

(2) Intra-host network-aware optimization. I am conducting comprehensive analysis on intra-host network spanning across hardware-level signals (e.g., PCIe, memory access patterns, cache efficiency, DDIO, etc.), inter-device data exchange, device drivers, OS dataplane. We aim to extend this to GPU hosts, where PCIe/NVLink interconnects and GPU memory hierarchies introduce additional complexity critical for ML workloads [Details].

Publications

[SIGCOMM '26] Capybara: Dynamic Load Balancing with Microsecond-Scale TCP Migration
Inho Choi, Nimish Wadekar, Guangda Sun, Raj Joshi, Joshua Fried, Omar S. Navarro Leija, Dan Ports, Irene Zhang, Jialin Li.
Proceedings of the 2026 ACM SIGCOMM Conference (SIGCOMM '26)

[APSYS '25] ML-native Dataplane Operating Systems
Inho Choi, Anand Bonde, Jing Liu, Joshua Fried, Irene Zhang, Jialin Li.
16th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2025)

[ArXiv 2023] A Primer on RecoNIC: RDMA-enabled Compute Offloading on SmartNIC
Guanwen Zhong, Aditya Kolekar, Burin Amornpaisannon, Inho Choi, Haris Javaid, Mario Baldi.
ArXiv, Dec, 2023

Summary

The main goal of this project was to bring network data as close to computation as possible. For example, our server had a FPGA-based SmartNIC device to accelerate some computation tasks. By default, data received from a remote peer through network is first stored in the host's memory, and then copied to the FPGA memory on the NIC through the PCIe bus. And once the computation tasks are completed, results are copied back to the host's memory, and then sent to the remote peer. This CPU-centric solution introduces multiple data copies, leading to a significant inefficiency.

To address this problem, we developed a new platform by co-designing the SmartNIC HW shell and host SW stacks. In the HW shell, we integrated an RDMA offload engine and FPGA-based programmable compute logic modules. Developers have the flexibility to design their accelerators within the compute logic modules. Then, network data from remote peers via RDMA can be stored into either the FPGA memory for accelerated computation or host memory, and also the RDMA engine can directly perform RDMA operations to access host memory of remote peers. The SW stack includes user-space library and kernel-level device drivers to support host's access to device memory and configure device registers from host application.

My main task was verifying the overall workflow and testing an example use case of accelerating matrix multiplication using the framework. For example, we store two matrix arrays on a remote machine, and the RecoNIC host performs RDMA READ to those arrays. Then, those arrays are directly stored into the device memory. Once detecting the readiness of the two arrays, the host issues a compute control command to the Compute Logic to start computation. Once the computation is finished, the host reads the result back to the host memory for verification.

[APSYS '23] Capybara: μSecond-scale live TCP migration
Inho Choi, Nimish Wadekar, Raj Joshi, Joshua Fried, Dan R. K. Ports, Irene Zhang, Jialin Li.
14th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2023)

[SIGCOMM '23] Network Load Balancing with In-network Reordering Support for RDMA
Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan.
Proceedings of the 2023 ACM SIGCOMM Conference (SIGCOMM '23)

Summary

ConWeave is a new network load balancing system for RDMA (RoCEv2) traffic. In datacenters, RoCEv2 is widely used to implement RDMA communication on Ethernet networks, but it still inherits many of the design assumptions of InfiniBand networks, like lossless and in-order packet delivery. So, when an RNIC receives a packet out-of-order, it treats it as an indication of packet loss due to network congestion and initiates loss recovery which results in the sending RNIC decreasing its sending rate. However, existing load balancing mechanisms in datacenters, like per-packet or per-flowlet load balancing, often cause out-of-order packets, so it's hard to be utilized for RDMA.

So we designed a new load balancing system for RDMA. The main idea is in-network packet reordering. The reason that we do the re-ordering in the network instead of the destination host is because RDMA bypasses host OS and it's hard to modify the behavior of commodity RDMA HW. So, we leverage programmable network switches for packet reordering. Recent programmable switches like Intel Tofino2 allow storing packets in the network with multiple FIFO queues and queue pause/resume features. ConWeave is a ToR-to-ToR mechanism. It makes re-routing (load-balancing) decisions at the source ToR, and re-orders packets at the destination ToR. The source ToR periodically monitors RTT of each flow. If the RTT exceeds a threshold, it treats the existing route as congested and triggers rerouting. Then, it tags a TAIL flag to the last packet going to the current path, and tags REROUTED to subsequent packets going to the new path. If a REROUTED packet arrives at the DstToR earlier than the TAIL packet, the DstToR dynamically assigns a reorder queue and buffers the packet. After DstToR receives the TAIL packet, it flushes the reorder queue and sends a CLEAR signal to the SrcToR, meaning that there are no out-of-order packets and it can start a new re-routing epoch.

[NSDI '23] Hydra: Serialization-Free Network Ordering for Strongly Consistent Distributed Applications
Inho Choi, Ellis Michael, Yunfan Li, Dan Ports, and Jialin Li.
Proceedings of the 20th USENIX Conference on Network Systems Design and Implementation

[S&P '20] A Stealthier Partitioning Attack against Bitcoin Peer-to-Peer Network
Muoi Tran, Inho Choi, Gi Jun Moon, Viet-Anh Vu, and Min Suk Kang.
In Proceedings of IEEE Symposium on Security and Privacy, May 2020

[UbiComp Workshop '17] Multimodal Data Collection Framework for Mental Stress Monitoring
Saewon Kye, Junhyung Moon, Juneil Lee, Inho Choi, Dongmi Cheon, and Kyoungwoo Lee.
In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers (Workshop Paper)

Experiences

National University of Singapore - Postdoctoral Research Fellow

Singapore

School of Computing (Advisor: Jialin Li)

Feb 2026 — Present

Microsoft Research - PhD Research Intern

Redmond, WA, USA

Systems Research Group (Mentor: Irene Zhang)

June 2025 — Sep 2025

Conducted research on ML-native dataplane OS architecture for automatic parameter tuning to address performance optimization challenges in modern datacenters at microsecond scales. Designed system architecture integrating machine learning natively into the OS dataplane for dynamic parameter optimization. Developed and evaluated the approach to demonstrate benefits of automated tuning across diverse workloads. Published initial findings at APSys '25, with full-system development currently in progress.

Microsoft Research - PhD Research Intern

Redmond, WA, USA

Systems Research Group (Mentor: Dan Ports)

June 2024 — Aug 2024

Conducted research on dynamic Layer-4 load balancing to address tail latency issues under unpredictable workloads and traffic bursts due to static TCP connection assignment between client and server. Designed and implemented a new dynamic load balancing solution with microsecond-scale TCP connection migration, leveraging programmable switches and kernel-bypass technologies. Published initial findings at APSys '23, with full paper currently under submission.

AMD - PhD Research Intern

Singapore

Xilinx - FPGA / System Design Lab (Mentor: Guanwen Zhong)

May 2023 — Aug 2023

Contributed to research on hardware accelerator architectures for datacenter networks, specializing in RDMA protocol optimization and computation offloading on FPGA-based SmartNIC platforms. Conducted end-to-end validation of the RecoNIC platform and implemented a representative RDMA-based hardware acceleration use case (systolic-array matrix multiplication), demonstrating direct device-memory data placement and FPGA-accelerated computation, with findings published on ArXiv.

National University of Singapore - Research Intern

Singapore

Systems & Network Security Lab (Advisor: Min Suk Kang)

Oct 2018 — Feb 2019

Contributed to research on network-level security vulnerabilities in Bitcoin's peer-to-peer network. Participated in demonstrating partitioning attacks on Bitcoin's peer-to-peer network and defense mechanisms. Co-authored a paper published at IEEE S&P '20, advancing cryptocurrency network security.

Metlife - Summer Intern

Seoul, Korea

IT Planning Team

July 2018 — Aug 2018

Analyzed IT infrastructure and database server architecture at MetLife Korea, investigating OAuth 2.0 protocol adoption for security enhancement. Presented findings through internal seminar.

Yonsei University - Research Intern

Seoul, Korea

Dependable Computing Lab (Advisor: Kyoungwoo Lee)

Feb 2017 — Jun 2017

Contributed to development of a multimodal stress monitoring framework for analyzing people's physiological and behavioral reactions to stressors. Participated in designing and conducting experiments and implementing a real-time signal processing framework. Co-authored paper published at UbiComp Workshop '17, presenting the framework and experimental findings.

Awards

Research Incentive Award

National University of Singapore

Oct 2023

Research Achievement Award

National University of Singapore

Jan 2023

NUS Research Scholarship

National University of Singapore

Aug 2019 — Jul 2023

Honors - 1st Semester, 2018

Yonsei University

Aug 2018

Mentoring Experiences

Yiyang Liu — NUS, Singapore (2024)
[Master's Thesis] Enhancing Distributed Systems with Hydra: A Software Solution for Scalable Network Ordering

Services

Reviewer ToN '25
Shadow PC Reviewer EuroSys '26, EuroSys '25
Student Volunteer SOSP '21

Invited Talks

Co-Designing Systems for Microsecond-Scale Consistent Tail Latency in Datacenters Yonsei University (2025), Seoul National University (2025)