I’ve been doing the entirety of my baseball coding in Rust for quite some time now. At some point in my mid-thirties, I decided it was time for me to learn a systems programming language, where I could write high-performance software. I was terrified of C and C++, as those are languages where even experts make critical mistakes. After much research, I settled on Rust as my language of choice.
Today I’ll give a brief overview of my code, which you can find here:
Why Rust?
#1 Community
I’m speaking extremely broadly here, however, many software ecosystems in the systems space are at times environments that are not welcoming to non cis-gendered males. Many of the key contributors to Rust were possibly people who felt unwelcome in other communties. As such, a key focus of Rust in the early days was centered on providing an inclusive environement, which for me meant that my complete lack of systems programming knowledge would not be a barrier to me learning Rust.
This is Rust’s motto:
A language empowering everyone
to build reliable and efficient software.
Everyone. I want to re-iterate that I had *zero* systems level programming experience. The last serious coding I had done was hacking Nibbles.bas in MS-DOS back when I was in high shcool. I think I created an AI snake for that, as well as random levels, and a save/load mechanism. Man I’m getting old.
I spent some time reading The Book, and if you are Rust-curious, I would encourage you to start there as well. It is not written assuming you are a systems programmer, with years of domain knowledge.
#2 Cargo - Rust’s Package Manager
Rust comes with Cargo, which makes compiling and pulling in dependecies really easy. You can see the list of dependencies that I pull in for BOSS in the snippet below:
[dependencies]
reqwest = {version = "0.11", features = ["blocking"]}
csv = {version = "1.1"}
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
rayon ="1.6"
regex = "1"
The file will be in Cargo.toml in the root directory of the GitHub repo linked above. Here’s what each of the packages do:
Reqwest pulls data from the internet. I use the “blocking” feature, which means it makes synchronous requests which are much simpler than asynchronous requests. That’s a whole post onto itself, but the short explanation is that a synchronous operation will wait until it is done before your program does anything else.
CSV handles reading and writing to CSV files. My CSV file is currently roughly 70 GBs of raw data. BOSS keeps track of which games have been pulled and simply appends the records to the end of the CSV file. Since Rust is extremely structured, I never have to worry about the columns changing, unless I’ve changed them.
Serde and Serde_JSON provide easy serialization and de-serialization of JSON and other files. This means all we need to do is specify the structure of a json file, and Serde handles all the parsing logic. It’s basically magic.
Here’s what that looks like:
#[derive(Deserialize, Serialize, Debug)]
#[serde(from = "Schedule")]
pub struct Schedule {
pub games: Games,
}
and then we also describe the game:
#[derive(Deserialize, Debug)]
pub (crate) struct GameDe {
#[serde(alias="gameType")]
game_type: GameType,
#[serde(alias="gamePk")]
game_pk: u32,
#[serde(alias="gameDate")]
game_date: String,
teams: Teams,
venue: VenueID,
status: GameStatus,
}
How it all works is a little more involved that just that, and hopefully I’ll cover it in full detail in a tutorial.
Rayon provides parallel execution. I can simply change a .iter() call into a .par_iter() call, and voila, it safely executes the same operation across all available cores. I don’t need to know how to do safe multithreading, I just have to set things up so they can use Rust’s iterator idoms and I’m good to go. In Rust, this is referred to as fearless concurrency.
Regex provides regex functionality. I don’t think I’m actually using it for anything, I must have pulled it in a while back. Neat future of Rust is that if I were to delete it, the compiler would show me every line where my code is now broken.
#3 Protection from Memory Bugs
This is a highly technical subject, so I’ll try to stay high level. Even the best of the best C/C++ programmers make mistakes when managing memory. This has led to code that can do strange things, because managing memory is a really hard problem to solve. In Rust, you never have to worry about it. It solves a whole class of memory safety problems, but comes at the cost of a pedantic compiler that politely asks you to fix your code before it will compile it for you.
BOSS - How it Works
Everything starts with the schedule API. This is how we get a list of games that we need. From there, we pull metadata, such as player info, boxscore info and venue info. Then we parse the play by play data. That’s an extremely short description of what’s going on, but getting into the details would require a much longer post. If you want to trace how all the code connects, you can start with the main.rs file.
Try it out
If you want to try it out, but are struggling with the code, I’ll be happy to walk you through how to use the code yourself. There’s probably a couple of places where I have me-specific stuff hard coded, but that should be pretty easy to fix. DM me on Twitter and I’ll be delighted to help you get started with Rust. You might even find a bug (or several bugs) by using the code.