Virtual Thread Part 2

Author: Ter-Petrosyan Hakob

Java’s virtual threads (from Project Loom) are a game-changer for high-concurrency applications. They let us create millions of threads without blowing up memory or overwhelming the OS. In this post, we’ll explore how virtual threads work under the hood in simple terms, and how they differ from traditional platform threads (the normal Java threads backed by OS threads). We’ll look at how the JVM schedules virtual threads, what mounting and unmounting mean, how stack memory is handled, what continuations are, how virtual threads behave during blocking calls and synchronized blocks, and some limitations (like pinning) and best practices to get the most out of this new feature.

What Are Virtual Threads (vs Platform Threads)?

Platform threads (the regular threads we’ve always used in Java) are essentially a one-to-one wrapper around an OS thread. Each platform thread is managed by the operating system, has a relatively large fixed stack (often around 1MB by default), and context-switching between them is handled by the OS. Because they’re heavy, creating thousands of platform threads can exhaust system resources.

Virtual threads introduced as a preview in JDK 19 and finalized in JDK 21, are lightweight threads managed by the JVM rather than the OS. A virtual thread is not tied one-to-one to a native thread. Instead, many virtual threads can share a few OS threads under the hood. Key differences between virtual and platform threads include:

In short, a virtual thread is a lightweight, managed thread that the JVM can park and resume efficiently. This lets us write code in the simple thread-per-request style and still handle huge concurrency.

How the JVM Schedules Virtual Threads

Scheduling virtual threads is the job of the JVM (Java Virtual Machine) rather than the OS. The JVM uses a scheduler (executor) internally to manage virtual threads. By default, it uses a fork-join pool (similar to the common pool) as a carrier thread pool. Carrier threads are the actual OS threads that carry out the work of virtual threads. By default, the number of carrier threads is equal to the number of CPU cores on your machine (this can be tuned via the JVM property -Djdk.virtualThreadScheduler.parallelism).

Here’s how scheduling works step by step:

img1

Virtual threads (orange) are executed on platform threads (blue), which run on OS threads (green). A virtual thread currently running is mounted to a platform thread (solid boxes). Virtual threads that are ready but not yet running remain unmounted and waiting (blue dashed box: Unmounted Ready threads). Virtual threads that are blocked (e.g. waiting on I/O) are unmounted and parked (orange dashed box: Unmounted Blocked threads). The JVM’s scheduler keeps carrier threads busy by mounting different virtual threads when others are blocked.

Because of this scheduling model, blocking a virtual thread doesn’t block a carrier OS thread in many cases. The OS thread simply moves on to run other virtual threads. This is how we can have thousands of concurrent tasks without tying up thousands of OS threads.

Cooperative vs Preemptive Scheduling

It’s important to realize that the JVM scheduling is cooperative. A virtual thread will run uninterrupted on a carrier until it voluntarily yields (for example, by hitting a blocking call). The JVM won’t preempt (stop) a running virtual thread just to schedule another one, except at these safepoints. This means if you have a long CPU-bound calculation in a single virtual thread, it will not automatically switch out to let others run on that same carrier thread.

If you have as many carrier threads as CPU cores, CPU-bound tasks on different carriers can run in parallel, but if you artificially limit the carrier thread pool (e.g. to 1 thread) and run a never-blocking loop on one virtual thread, it could starve others from running on that single carrier. In practice, the default carrier pool is sized to CPUs, so compute-bound tasks will use all cores, similar to platform threads. But virtual threads truly shine for I/O-bound and high-concurrency workloads where threads spend a lot of time waiting.

Example:

Suppose you have a server with 4 CPU cores and thus ~4 carrier threads by default. If you create 1000 virtual threads that each handle a network request, the JVM will schedule at most 4 of them to run at once (one per carrier). Whenever any of those threads hits an I/O wait (e.g., waiting for a database response), that thread is unmounted and another pending virtual thread is mounted on that carrier thread. In this way, the carriers are always kept busy doing real work, and none are stuck waiting on I/O. The result is high throughput with a small number of actual OS threads. The JVM’s goal is to keep the carrier threads busy but not blocked.

Mounting and Unmounting: Running vs. Waiting in a Virtual Thread

We’ve already used the terms mounted and unmounted, which are central to how virtual threads work. Let’s define them clearly:

A virtual thread will frequently mount and unmount during its lifecycle. It mounts to run, then unmounts when it waits, then mounts again (possibly on a different carrier thread) to continue running when it’s ready. This is how a single virtual thread can “jump” between different underlying OS threads over time. It’s very different from a platform thread, which stays on the same OS thread for its entire life.

For example, consider this simple code using a virtual thread:

Thread vt = Thread.ofVirtual().unstarted(() -> {
    System.out.println("Before sleep: " + Thread.currentThread());
    try {
        Thread.sleep(20);    // simulate blocking
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    System.out.println("After sleep: " + Thread.currentThread());
});
vt.start();
vt.join();

When you run this, the output might be something like:

Before sleep: VirtualThread[#21]/runnable@ForkJoinPool-1-worker-1
After sleep:  VirtualThread[#21]/runnable@ForkJoinPool-1-worker-7

Notice that the same VirtualThread[#21] was running on one carrier thread (ForkJoinPool-1-worker-1) before the sleep, but after waking up, it resumed on a different carrier thread (ForkJoinPool-1-worker-7) ! This demonstrates mounting and unmounting in action. The virtual thread was mounted on the first worker, then during Thread.sleep() it got unmounted (freeing ForkJoinPool-1-worker-1 to do something else), and later the scheduler remounted it on ForkJoinPool-1-worker-7 to print the second line. From the virtual thread’s perspective, it just went to sleep and woke up – it doesn’t “know” it switched carriers.

How does the JVM park and unpark threads so smoothly? This is where the magic of continuations comes in, which we’ll cover soon. Essentially, when a virtual thread blocks, the JVM captures its current state (the call stack, program counter, etc.) into a continuation object and then frees the carrier thread. When it’s time to resume, the saved state is used to continue where it left off.

Mounting and unmounting happen very fast, much faster than an OS context switch. There is no heavy kernel context switch when a virtual thread unmounts – it’s more like saving and restoring a function call stack. This is why millions of virtual threads can be scheduled efficiently by the JVM.

Stack Memory Management: Virtual Threads vs Platform Threads

One of the reasons virtual threads are lightweight is how they handle stack memory. Every thread in Java has its own call stack where local variables and method calls live. With platform threads, this stack is a chunk of memory allocated by the OS (native memory). By default it can be quite large (on the order of 1MB, though it may not all be committed at once). If you try to create 100,000 platform threads, you’d need up to 100k × 1MB = ~100 GB of memory just for stacks, which is obviously impractical. Virtual threads use a different strategy:

Because of this design, virtual threads have a much smaller memory cost per thread. Instead of a big chunk of native memory per thread, we have many small stacks in the heap that expand only as needed. As one source put it, platform threads use “megabyte-scale chunks of memory” for the stack, while a virtual thread’s stack footprint “starts at just a few hundred bytes” and grows in the heap. This is why you can create millions of virtual threads – their total memory usage is manageable and mostly garbage-collectible. In effect, the cost of a thread is almost as small as the cost of a Runnable or a Task object, plus whatever actual stack depth it’s using at the moment.

Continuations: The Secret to Pausing and Resuming

How can a thread be suspended and resumed in the middle of execution? The answer is continuations. A continuation is a low-level construct that represents a segment of code that can be paused (yielded) and resumed later, preserving its state.

In Project Loom, each virtual thread is backed by a Continuation object internally. You can think of a continuation like a bookmark in the execution of the thread. When the virtual thread needs to pause (for example, it calls a blocking I/O operation), the continuation mechanism creates a snapshot of that thread’s execution state (the call stack, instruction pointer, etc.). This is similar to how if you pause a single-player video game, the game saves all your progress and state so you can come back later exactly where you left off.

When the thread is ready to run again (say the I/O is done), the JVM uses the continuation to resume execution from where it paused. All the local variables, the point in the code, etc., are restored, and the thread continues as if nothing happened – except it might now be running on a different carrier thread.

To put it simply: A virtual thread = a scheduled continuation (task) managed by the JVM scheduler. The continuation provides the ability to suspend and resume, and the scheduler provides the where and when to run it.

These continuations are delimited, meaning the JVM controls exactly where a virtual thread is allowed to pause (typically at JVM-defined safepoints like blocking calls). The Java code in a virtual thread can’t arbitrarily pause at any CPU instruction; it pauses in well-defined places that the JVM knows about (like method calls that go into the JVM’s thread parking logic).

You normally don’t interact with Continuation objects directly when writing application code (the Continuation API is internal/preview for now), but it’s useful to know they exist. They’re the “magic” that makes virtual thread suspension possible. One author described a continuation as “sequential code that may suspend or yield execution at some point by itself and can be resumed by a caller”. In our context, the “caller” is the JVM scheduler that resumes the virtual thread when it’s time.

So, when you see a virtual thread go to sleep or wait on I/O, under the hood the JVM is doing roughly: save continuation (thread state) → free carrier thread → later, restore continuation on a carrier thread → continue execution. All of this is optimized in the JVM so that it’s extremely fast – certainly much faster than the equivalent thread context-switch involving OS scheduler.

Blocking I/O in Virtual Threads – What Really Happens?

One of the biggest benefits of virtual threads is that you can call blocking I/O operations (like reading from a socket, waiting for a file, calling a database, etc.) without worrying about blocking an OS thread. In a traditional thread-per-request model, if a thread blocked on I/O, that OS thread would sit idle, unable to do other work, until the I/O finished. With thousands of concurrent operations, that meant having thousands of OS threads, which doesn’t scale.

With virtual threads, when a thread blocks on I/O, the JVM tries to transparently turn it into a non-blocking operation and park the virtual thread. For example, when a virtual thread performs a socket read, the JDK underneath uses asynchronous I/O (if available on the platform) or a lightweight mechanism to wait for data, and it unmounts the virtual thread while waiting. The carrier thread is released to run other tasks. Once the data is ready, the virtual thread is resumed (mounted again) and continues processing the data.

From the developer’s perspective, it looks like a normal blocking call – you can use the simple synchronous I/O APIs – but it behaves like asynchronous I/O under the hood, giving you the best of both worlds (simplicity and scalability). Virtual threads are often described as allowing “structured concurrency with threads” – you write sequential code that blocks, but it scales like asynchronous code.

Example:

If you use a virtual thread per incoming HTTP request in a server, when one request handler calls database.query() and that query is waiting on the DB, that virtual thread will unmount. The OS thread running it will go pick up another request’s thread to run in the meantime. When the DB query results return, the original virtual thread is woken up and continues sending the response. Meanwhile, the OS threads were never idle; they were always busy handling whatever work was ready.

This is why virtual threads are especially useful for I/O-bound workloads where threads spend a lot of time waiting. You can spawn massive numbers of threads to handle many concurrent I/O operations, and the JVM will efficiently manage them so that only the active ones tie up carrier threads.

What about file I/O or other blocking calls? Most of Java’s classic I/O (especially networking) is handled in a way that Loom can unmount the thread. However, certain operations might not yet be Loom-optimized. For instance, blocking file I/O (like reading a file from disk) may not unmount the virtual thread on some platforms. If a virtual thread calls a blocking file I/O operation, the current implementation might pin the virtual thread to the carrier until the I/O is done (because the JVM can’t easily park and switch on purely kernel-level file read). The Loom team has worked to cover many common operations, but it’s good to be aware that not every blocking call is virtual-thread friendly. In practice, network I/O, sleep(), wait(), and other high-level blocking APIs are handled.

Next, let’s discuss a tricky area: synchronized blocks and other locks, which introduce the concept of pinning.

Pinning: Synchronized Blocks and Other Limitations

Pinning is a situation where a virtual thread cannot be unmounted from its carrier thread, effectively tying that carrier thread down. Pinning is not an error – the code still works correctly – but it means you lose the scalability benefits for that duration, since the OS thread is stuck waiting along with the virtual thread. The goal is to avoid or minimize pinning, or design around it.

When does pinning happen? According to the JDK documentation and design of Loom, there are two main cases where a virtual thread gets pinned to its carrier and won’t unmount on block:

  1. While executing inside a synchronized block or method. If a virtual thread holds a monitor lock (entered a synchronized section) and then it hits a blocking operation, the JVM will not unmount it. It stays pinned to the carrier until the synchronized block is exited. This is because allowing it to unmount could lead to thread-safety issues; the lock is tied to the thread holding it, and if that thread “disappeared” off the carrier, the JVM would have to prevent any other thread from entering that monitor. It’s simpler and safer to just keep it pinned. Additionally, if a virtual thread is trying to enter a synchronized block (waiting for a lock) and gets blocked because another thread holds the lock, it is effectively pinned waiting for that monitor. Several virtual threads contending on a synchronized lock can each pin a carrier thread while they wait. This was observed in real cases: e.g., if 4 virtual threads are all waiting on the same lock, and you only have 4 carrier threads, those 4 carriers can all become stuck (pinned) waiting – meaning no other work can proceed until one gets the lock and releases the others.
  2. When calling a native method or foreign function. If your virtual thread calls into native code (via JNI or a foreign function interface) that blocks in the native layer, the JVM cannot suspend and resume that at the Java level. The thread will remain pinned to the OS thread until the native call returns. Essentially, the JVM only controls Java code; once you’re in native code, Loom’s tricks can’t be applied.

There may be other niche situations as Loom evolves (for example, the early Loom versions pinned for some file I/O as mentioned, or certain JDK synchronization constructs), but synchronized and native calls are the primary culprits.

Why is pinning bad for scalability? Because if a carrier thread is pinned, it cannot be used to run any other virtual thread until the pinning is over. If enough carrier threads become pinned, new virtual threads will be stuck waiting for a free carrier. In the worst case, if all carriers are pinned (say, all are stuck in long sync blocks), your application throughput stalls – no other work can be scheduled until a carrier frees up. The JVM can mitigate this a bit by creating additional carrier threads if the pool is not at its max size, to avoid a deadlock of progress. By default the carrier pool can grow up to some limit (for example, 256 threads). But if you consistently pin a lot of threads, you could hit those limits or degrade performance.

Practical example of pinning: Imagine a web server where each request is handled by a virtual thread. Suppose a certain handler method has a synchronized block (maybe guarding access to a static resource) and inside that block it makes a blocking call (like writing to a file or waiting on a condition). If many requests hit this simultaneously, one gets the lock and the others block. All those blocked virtual threads are pinned on their carriers. If the number of contending threads equals your carrier threads, you’ve effectively made your server single-threaded until that lock is released – not good! Netflix engineers ran into a similar issue with Tomcat + virtual threads, where a lock caused multiple virtual threads to pin and exhaust the carrier pool, leading to stalled throughput.

Best Practices and Things to Avoid with Virtual Threads

Virtual threads are powerful, but to use them effectively, keep these best practices and tips in mind:

Use updated tools for thread dumps: Traditional jstack might not list all virtual threads or their stack traces by default (to avoid overwhelming output). Instead, use jcmd Thread.dump_to_file `or JDK Mission Control/Flight Recorder which have added support for virtual threads. These tools can capture the state of virtual threads if needed. Be aware that an idle virtual thread might not show up with a meaningful stack (it’s parked).

Conclusion

Java’s virtual threads allow us to write straightforward concurrent code (like spawning a thread per task) that scales to huge levels of concurrency. They work by using a small pool of OS carrier threads to run a large number of virtual threads through a clever scheduling of mounting (running) and unmounting (parking) threads as they block and resume. Under the hood, the JVM uses continuations to capture a thread’s state and manage stack frames on the heap, making suspension and resumption of threads extremely fast. Virtual threads shine for applications that are I/O-bound or have a high level of concurrency with a lot of waiting (such as servers handling many requests), enabling near-optimal hardware utilization with a simple programming model.

By understanding how the JVM handles virtual threads – especially the cooperative nature of scheduling and the concept of pinning – developers can design applications to avoid pitfalls. Keep critical sections short, avoid blocking operations under locks, and be mindful of any code that might pin or block a carrier thread. With these best practices, virtual threads can drastically simplify writing high-throughput concurrent applications in Java, giving you the performance of asynchronous code with the readability of sequential code.

Summary:

Virtual threads are lightweight threads managed by the JVM that allow you to create thousands or millions of concurrent tasks with minimal overhead. They are scheduled cooperatively on a small set of carrier (OS) threads, automatically pausing on blocking calls and resuming later, which frees up the OS threads to do other work. Embrace virtual threads for simpler concurrent code, but use them wisely by avoiding long blocking critical sections and understanding their scheduling model. With Project Loom’s virtual threads, you can have the simplicity of one-thread-per-task and achieve the scalability that was once only possible with complex async frameworks – truly the best of both worlds for Java developers.