
From 'It Might Work' to 'It Will Work': Typestate in Rust
- The Problem: Runtime Invariants vs Compile-Time Guarantees
- Enter the Typestate Pattern
- Seeing the Compiler Catch the Bug
- Advanced Patterns: Session Types and Protocol Enforcement
- Alternative Approach: Multiple State Types
- Combining with Result Types for Fallible Transitions
- Builder Pattern with Typestate
- Performance Considerations
- When to Use This Pattern
- Limitations and Trade-offs
- Conclusion
One of Rust’s greatest strengths lies not just in memory safety, but in its ability to encode business logic and invariants directly into the type system. Today, we’ll explore how to design APIs that make invalid states literally impossible to represent, pushing error detection from runtime to compile time.
The Problem: Runtime Invariants vs Compile-Time Guarantees
Consider a simple file handle API. In most languages, you might write something like this:
struct FileHandle {
path: String,
is_open: bool,
content: Option<String>,
}
impl FileHandle {
fn open(path: String) -> Result<Self, std::io::Error> {
// Open file logic
Ok(FileHandle {
path,
is_open: true,
content: None,
})
}
fn read(&mut self) -> Result<&str, std::io::Error> {
if !self.is_open {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"Cannot read from closed file"
));
}
// Read logic...
Ok("file content")
}
fn close(&mut self) {
self.is_open = false;
self.content = None;
}
}
This approach has several problems:
- Runtime checks for every operation
- Possible inconsistent state (what if
is_open
is true butcontent
isNone
?) - Easy to forget checks, leading to bugs
- No compile-time guarantees about correct usage
Enter the Typestate Pattern
The typestate pattern leverages Rust’s type system to encode object states as distinct types. Let’s redesign our file handle:
use std::fs::File;
use std::io::{self, Read};
use std::marker::PhantomData;
// State markers
struct Closed;
struct Open;
struct FileHandle<State> {
path: String,
file: Option<File>,
_state: PhantomData<State>,
}
// Only closed files can be opened
impl FileHandle<Closed> {
fn new(path: String) -> Self {
FileHandle {
path,
file: None,
_state: PhantomData,
}
}
fn open(mut self) -> Result<FileHandle<Open>, (Self, io::Error)> {
match File::open(&self.path) {
Ok(file) => Ok(FileHandle {
path: self.path,
file: Some(file),
_state: PhantomData,
}),
Err(e) => {
self.file = None; // Ensure consistency
Err((self, e))
}
}
}
}
// Only open files can be read or closed
impl FileHandle<Open> {
fn read(&mut self) -> Result<String, io::Error> {
// No need to check if file is open - it's guaranteed by the type!
// The file handle is consumed when opened, so we know it exists
let file = self.file.as_mut().expect("File must exist in Open state");
let mut contents = String::new();
file.read_to_string(&mut contents)?;
Ok(contents)
}
fn close(self) -> FileHandle<Closed> {
// File is automatically dropped here, demonstrating resource cleanup
FileHandle {
path: self.path,
file: None, // File is explicitly closed
_state: PhantomData,
}
}
}
Now, attempting to read from a closed file is a compile-time error, and we get true resource safety:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = FileHandle::new("test.txt".to_string());
// This won't compile!
// file.read(); // Error: no method named `read` found for `FileHandle<Closed>`
let mut open_file = file.open()?;
let content = open_file.read()?; // This works and actually reads the file!
println!("File content: {}", content);
let closed_file = open_file.close(); // File resource is properly cleaned up
// This won't compile either!
// closed_file.read(); // Error: no method named `read` found for `FileHandle<Closed>`
Ok(())
}
Seeing the Compiler Catch the Bug
Let’s see exactly what happens when we try to misuse our API. If we uncomment that first read()
call:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = FileHandle::new("test.txt".to_string());
file.read(); // Oops!
Ok(())
}
The compiler immediately catches this:
error[E0599]: no method named `read` found for struct `FileHandle<Closed>` in the current scope
--> src/main.rs:4:10
|
4 | file.read();
| ^^^^ method not found in `FileHandle<Closed>`
|
= note: the method was found for
- `FileHandle<Open>`
Similarly, if we try to read after closing:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = FileHandle::new("test.txt".to_string());
let mut open_file = file.open()?;
let closed_file = open_file.close();
closed_file.read(); // Oops!
Ok(())
}
We get:
error[E0599]: no method named `read` found for struct `FileHandle<Closed>` in the current scope
--> src/main.rs:5:17
|
5 | closed_file.read();
| ^^^^ method not found in `FileHandle<Closed>`
|
= note: the method was found for
- `FileHandle<Open>`
The compiler not only tells us what’s wrong, but helpfully points out that read
does exist—just not for the state we’re in!
Advanced Patterns: Session Types and Protocol Enforcement
Let’s explore a more complex example: a TCP connection state machine.
// Connection states
struct Disconnected;
struct Connected;
struct Authenticated;
struct TcpConnection<State> {
address: String,
_state: PhantomData<State>,
}
impl TcpConnection<Disconnected> {
fn new(address: String) -> Self {
TcpConnection {
address,
_state: PhantomData,
}
}
fn connect(self) -> Result<TcpConnection<Connected>, (Self, std::io::Error)> {
// Connection logic
Ok(TcpConnection {
address: self.address,
_state: PhantomData,
})
}
}
impl TcpConnection<Connected> {
fn authenticate(self, credentials: &str) -> Result<TcpConnection<Authenticated>, Self> {
// Authentication logic
if credentials == "valid" {
Ok(TcpConnection {
address: self.address,
_state: PhantomData,
})
} else {
Err(self)
}
}
fn disconnect(self) -> TcpConnection<Disconnected> {
TcpConnection {
address: self.address,
_state: PhantomData,
}
}
}
impl TcpConnection<Authenticated> {
fn send_data(&self, data: &[u8]) -> Result<(), std::io::Error> {
// Can only send data when authenticated
println!("Sending {} bytes", data.len());
Ok(())
}
fn disconnect(self) -> TcpConnection<Disconnected> {
TcpConnection {
address: self.address,
_state: PhantomData,
}
}
}
This design enforces a strict protocol:
- Must connect before authenticating
- Must authenticate before sending data
- Can disconnect from any connected state
Alternative Approach: Multiple State Types
For more complex state machines, we can use separate types for each state:
trait ParserState {}
struct Initial;
struct ReadingString;
struct ReadingNumber;
struct Complete;
struct Error;
impl ParserState for Initial {}
impl ParserState for ReadingString {}
impl ParserState for ReadingNumber {}
impl ParserState for Complete {}
impl ParserState for Error {}
struct Parser<S: ParserState> {
input: String,
position: usize,
_state: PhantomData<S>,
}
impl Parser<Initial> {
fn new(input: String) -> Self {
Parser {
input,
position: 0,
_state: PhantomData,
}
}
fn start_string(self) -> Parser<ReadingString> {
Parser {
input: self.input,
position: self.position,
_state: PhantomData,
}
}
fn start_number(self) -> Parser<ReadingNumber> {
Parser {
input: self.input,
position: self.position,
_state: PhantomData,
}
}
}
impl Parser<ReadingString> {
fn read_char(&mut self) -> Option<char> {
self.input.chars().nth(self.position).map(|c| {
self.position += 1;
c
})
}
fn finish(self) -> Parser<Complete> {
Parser {
input: self.input,
position: self.position,
_state: PhantomData,
}
}
}
impl Parser<Complete> {
fn result(&self) -> &str {
&self.input[..self.position]
}
}
Combining with Result Types for Fallible Transitions
Real-world state machines often have fallible transitions.
We can combine our typestate pattern with Result
types:
#[derive(Debug)]
enum ConnectionError {
NetworkError,
AuthenticationFailed,
Timeout,
}
impl TcpConnection<Connected> {
fn authenticate_fallible(
self,
credentials: &str
) -> Result<TcpConnection<Authenticated>, (Self, ConnectionError)> {
if credentials.is_empty() {
return Err((self, ConnectionError::AuthenticationFailed));
}
// Simulate network failure
if credentials == "network_fail" {
return Err((self, ConnectionError::NetworkError));
}
Ok(TcpConnection {
address: self.address,
_state: PhantomData,
})
}
}
This pattern ensures that:
- On success, we get the desired state
- On failure, we get back the original state and can retry or handle the error
- We never lose our connection object
Builder Pattern with Typestate
The typestate pattern works excellently with builders, ensuring required fields are set:
struct HttpRequestBuilder<HasUrl, HasMethod> {
url: Option<String>,
method: Option<String>,
headers: Vec<(String, String)>,
_has_url: PhantomData<HasUrl>,
_has_method: PhantomData<HasMethod>,
}
struct Yes;
struct No;
impl HttpRequestBuilder<No, No> {
fn new() -> Self {
HttpRequestBuilder {
url: None,
method: None,
headers: Vec::new(),
_has_url: PhantomData,
_has_method: PhantomData,
}
}
}
impl<HasMethod> HttpRequestBuilder<No, HasMethod> {
fn url(self, url: String) -> HttpRequestBuilder<Yes, HasMethod> {
HttpRequestBuilder {
url: Some(url),
method: self.method,
headers: self.headers,
_has_url: PhantomData,
_has_method: PhantomData,
}
}
}
impl<HasUrl> HttpRequestBuilder<HasUrl, No> {
fn method(self, method: String) -> HttpRequestBuilder<HasUrl, Yes> {
HttpRequestBuilder {
url: self.url,
method: Some(method),
headers: self.headers,
_has_url: PhantomData,
_has_method: PhantomData,
}
}
}
impl<HasUrl, HasMethod> HttpRequestBuilder<HasUrl, HasMethod> {
fn header(mut self, key: String, value: String) -> Self {
self.headers.push((key, value));
self
}
}
// Only builders with both URL and method can build
impl HttpRequestBuilder<Yes, Yes> {
fn build(self) -> HttpRequest {
HttpRequest {
url: self.url.unwrap(),
method: self.method.unwrap(),
headers: self.headers,
}
}
}
struct HttpRequest {
url: String,
method: String,
headers: Vec<(String, String)>,
}
Usage:
fn main() {
let request = HttpRequestBuilder::new()
.url("https://api.example.com".to_string())
.method("GET".to_string())
.header("Authorization".to_string(), "Bearer token".to_string())
.build(); // This compiles!
// This won't compile - missing method:
// let invalid = HttpRequestBuilder::new()
// .url("https://api.example.com".to_string())
// .build(); // Error!
}
Performance Considerations
One of the beautiful aspects of these patterns is that they have zero runtime cost.
The PhantomData
markers are zero-sized types that get completely optimized away.
Let’s verify this:
use std::mem;
fn main() {
println!("Size of FileHandle<Closed>: {}", mem::size_of::<FileHandle<Closed>>());
println!("Size of FileHandle<Open>: {}", mem::size_of::<FileHandle<Open>>());
println!("Size of PhantomData<Closed>: {}", mem::size_of::<PhantomData<Closed>>());
}
All these will print the same size - just the size of the String
field.
When to Use This Pattern
The typestate pattern is particularly valuable when:
- State transitions are well-defined and finite
- Invalid operations would be serious bugs
- API misuse is a common source of errors
- Performance is critical (zero-cost abstractions)
- Documentation through types is valuable
Consider using it for:
- Network protocols and connection states
- File handles and resource management
- Parser state machines
- API clients with authentication flows
- Game state management
- Hardware driver interfaces
Limitations and Trade-offs
While powerful, this pattern has some limitations:
- Increased compile times due to more complex type checking
- Code duplication if states share many methods
- Learning curve for API users unfamiliar with the pattern
- Inflexibility - runtime state changes require design changes
Conclusion
The typestate pattern represents one of Rust’s most powerful compile-time guarantees. By encoding state into the type system, we move from “it might work” to “it will work” - transforming runtime errors into compile-time impossibilities.
When designing APIs, consider how you can make invalid states unrepresentable. Your future self (and your users) will thank you.
The next time you find yourself writing runtime checks for object state, ask: “Could I encode this in the type system instead?” Often, the answer is yes, and the result is more robust, self-documenting code that catches bugs before they happen.