Rust 中的命令列應用程式

Rust 是一種靜態編譯的、運行速度快的語言，它擁有強大的工具，其生態也在快速發展。

這使得它非常適合撰寫命令列應用程式：小巧、便攜且運行速度快。而命令列應用程式也是開始進行學習 Rust ，或向你的團隊介紹 Rust 的一種很好的方式。

撰寫一個簡單的命令列介面程式（CLI）對於剛接觸 Rust 且希望對其有所了解的初學者來說，是一個很好的練習。不過，這裡面也包含了許多方面，後面章節我們會介紹到它們。

本書的大綱如下：首先我們從一個快速教學開始，完成後你將會得到一個可用的 CLI 工具。

透過此教學，你將接觸到 Rust 的一些核心概念及 CLI 主要應用包括了哪些部分功能。而下面的章節則是介紹這些面向的實作細節。

最後，也是非常重要的一件事：如果在本書發現了錯誤，或想貢獻、豐富本書的內容，該專案原始碼在這裡 in the CLI WG repository ，期待你的回饋！

說明: 繁體中文的翻譯部分（從官方版本取得的部分，取至2024/05/20），請到這裡進行回饋！

在15分鐘內建立一個命令列應用程式

本教學將指導您建立CLI（命令列介面）應用程式

在Rust中，只需大約需要十五分鐘，就能得到可以運行的程式（大約第 1.3 節以後）。

之後，我們只需持續調整我們的程式，直到它可以被當作一個工具來打包發布。

你將學到如何開始所需的所有基本知識，以及如何去尋找更多有用資訊。當然，你可隨意跳過目前不需要了解的章節，或之後再翻回查看。

你想要寫一個什麼樣的專案呢？

不如我們先從一個簡單的開始：讓我們寫一個簡單的 grep。

我們給這個工具一個字串和一個檔案路徑，它將列印出每個包含所查字串的行。不如就叫它 grrs 吧（發音“grass”）。

最後，我們想讓它像這樣運行：

$ cat test.txt
foo: 10
bar: 20
baz: 30
$ grrs foo test.txt
foo: 10
$ grrs --help
[some help text explaining the available options]

專案初始化設置

如果你還沒有在你的電腦上安裝 Rust（應該只需要幾分鐘就能完成）。

然後，請打開終端機並切換到你的工作目錄，程式原始碼將會放置在這裡。

請開始在你想建立專案的目錄下執行cargo new grrs。

如果你查看新建立的 grrs 目錄，你會發現 Rust 專案的預設設定：

Cargo.toml 裡包含了我們專案所有的中繼資料，包括我們使用依賴/外部函式庫列表。
src/main.rs 是我們程式的二進制入口檔案（主程式）。

如果可以在grrs目錄下執行cargo run並得到一個 Hello, World!，那你就已經準備好了。

它可能會是什麼樣子

$ cargo new grrs
     Created binary (application) `grrs` package
$ cd grrs/
$ cargo run
   Compiling grrs v0.1.0 (/Users/pascal/code/grrs)
    Finished dev [unoptimized + debuginfo] target(s) in 0.70s
     Running `target/debug/grrs`
Hello, world!

解析命令列參數

我們的 CLI 工具的呼叫方法應該如下：

$ grrs foobar test.txt

我們希望此程式將尋找 test.txt 並列印出包含 foobar 的行。

但我們如何取得這兩個值呢？

在命令列中，程式名稱中後面的文字通常被稱為「命令列參數(command-line arguments)」或「命令列標籤(command-line flags)」（尤其是當他們看起來像 --this）。

作業系統通常會將它們識別為字串列表 — 簡單的說，以空格分隔。

有很多方法可以識別這些參數並解析，使它們變得更易於使用。

同時也需要告訴使用者，程式需要哪些參數及對應的格式是什麼。

取得參數

標準庫中提供的 std::env::args() 方法，提供了運行時給定參數的疊代器(iterator)。

首先，第一項（索引 0 ）是程式名稱（如 : grrs），後面部分才是使用者給定的參數。

以此方法取得原始參數就是這麼簡單（在 src/main.rs 的 fn main() 函數中）：

fn main() {
    let pattern = std::env::args().nth(1).expect("no pattern given");
    let path = std::env::args().nth(2).expect("no path given");

    println!("pattern: {:?}, path: {:?}", pattern, path)
}

我們可以使用 cargo run 來運行它，透過在 -- 之後寫入參數來傳遞參數：

$ cargo run -- some-pattern some-file
    Finished dev [unoptimized + debuginfo] target(s) in 0.11s
     Running `target/debug/grrs some-pattern some-file`
pattern: "some-pattern", path: "some-file"

CLI 參數的資料型別

與其將它們視為單純的一堆文字，不如將 CLI 參數看成程式輸入的自訂的資料型別。

注意 grrs foobar test.txt: 這裡有兩個參數，首先是 pttern（查看的字串），然後才是 path（尋找的檔案路徑）。

關於他們，我們還能說些什麼呢？

首先，這兩個參數都是程式所必須的，因為我們並未提供預設值，所以使用者需要在使用此程式時提供這兩個參數。

此外，關於參數的型別：pattern 應該是字串；第二個參數則應是檔案的路徑。

在Rust 中，根據所處理的資料而去建立程式是很常見的，因此這種看待參數的方法對我們接下來的工作很有幫助。

讓我們在這開始（在檔案 src/main.rs，fn main( ) { 之前）：

struct Cli {
    pattern: String,
    path: std::path::PathBuf,
}

這定義了一個新的結構體（a struct）它有兩個欄位來儲存資料： patern 和 path 。

現在，我們仍然需要取得我們的程式進入這種形式的實際參數。

有一種做法就是我們可以手動解析從作業系統上取得的參數列表並以此產生一個結構體。

就有點像是這樣：

fn main() {
    let pattern = std::env::args().nth(1).expect("no pattern given");
    let path = std::env::args().nth(2).expect("no path given");

    let args = Cli {
        pattern: pattern,
        path: std::path::PathBuf::from(path),
    };

    println!("pattern: {:?}, path: {:?}", args.pattern, args.path);
}

這種方式能正常運作，但用起來卻不是很方便。如何去支援像 --pattern="foo" 或 --pattern "foo" 這種參數輸入？又如何去實現 --help？

使用 Clap 解析 CLI 參數

使用現成的函式庫來實現參數的解析，這是更明智的選擇。 clap 是目前最受歡迎的解析命令列參數的函式庫。它提供了所有你需要的功能，如子指令、自動補全和完善的幫助資訊。

首先，我們需要在 Cargo.toml 檔案的 [dependencies] 欄位裡加入上 clap = { version = "4.0", features = ["derive"] } 來匯入 clap。

現在，我們可以在程式碼中加入use clap::Parser;, 和在先前建立的 struct Cli 的正上方加上 #[derive(Parser)]。

我們也可以在過程中撰寫一些文件註解。

讓我們在這開始（在檔案 src/main.rs，fn main( ) { 之前）：

use clap::Parser;

/// Search for a pattern in a file and display the lines that contain it.
#[derive(Parser)]
struct Cli {
    /// The pattern to look for
    pattern: String,
    /// The path to the file to read
    path: std::path::PathBuf,
}

在本範例中，我們的 Cli 結構體下方即是 main 函數。當開始執行程式時，就會呼叫這個函數：

fn main() {
    let args = Cli::parse();

    println!("pattern: {:?}, path: {:?}", args.pattern, args.path);
}

這將嘗試解析參數並儲存到 Cli 結構體中。

但如果解析失敗會怎樣？這就是使用此方法的美妙之處：Clap 知道它需要什麼字段，及所需字段的型別。它可以自動產生一個不錯的 --help 訊息，並會依錯誤給予一些建議－輸入的參數應該是 --output 而你輸入的是 --putput。

grrs 的首次運行

在完成命令列參數章節後，我們學到了如何取得輸入參數，和我們可以開始準備實作寫出工具了。

目前我們的 main 函數中僅有一行：

    let args = Cli::parse();

（我們刪除暫時放在那裡的 println 語句，得以證明我們的程式如預期般運作。）

讓我們先開啟我們得到的文件。

    let content = std::fs::read_to_string(&args.path).expect("could not read file");

現在，讓我們疊代一下這些行並輸出包含每一個我們的模式：

    for line in content.lines() {
        if line.contains(&args.pattern) {
            println!("{}", line);
        }
    }

總結

你的程式碼現在看起來應該是這樣的：

use clap::Parser;

/// Search for a pattern in a file and display the lines that contain it.
#[derive(Parser)]
struct Cli {
    /// The pattern to look for
    pattern: String,
    /// The path to the file to read
    path: std::path::PathBuf,
}

fn main() {
    let args = Cli::parse();
    let content = std::fs::read_to_string(&args.path).expect("could not read file");

    for line in content.lines() {
        if line.contains(&args.pattern) {
            println!("{}", line);
        }
    }
}

來試試: cargo run -- main src/main.rs 是否能運行！

合適的回饋錯誤

我們都無能為力，只能接受會發生錯誤的事實。

與許多其他語言相比，很難不去注意和應對這個現實。

在使用 Rust 時：既然沒有例外，所有可能的錯誤狀態通常都編碼在函數的傳回型別中。

結果

像 read_to_string 這樣的函數不會回傳字串。相反的，它會傳回一個 Result，裡面包含一個“字串”或某種其它型別的錯誤（在本例子為std::io::Error）。

那麼如何得知是哪一種型別呢？

因為 Result 也是 enum 型別，可以使用 match 去檢查裡面是哪一種變體：

#![allow(unused)]
fn main() {
let result = std::fs::read_to_string("test.txt");
match result {
    Ok(content) => { println!("File content: {}", content); }
    Err(error) => { println!("Oh noes: {}", error); }
}
}

展開

現在，我們可以存取文件的內容，但在 match 程式碼區塊後無法對它做任何事情。

因此，我們需要以某種方式處理錯誤的情況。因難點在於， match 程式碼區塊的所有分支都會回傳一個相同的型別。

但有一個巧妙的技巧來解決這個問題：

#![allow(unused)]
fn main() {
let result = std::fs::read_to_string("test.txt");
let content = match result {
    Ok(content) => { content },
    Err(error) => { panic!("Can't deal with {}, just exit here", error); }
};
println!("file content: {}", content);
}

我們可以在 match 程式碼區塊後使用 content。如果 result 是個錯誤，字串就不存在。但好家在，程式會在我們使用 content 之前就會自行退出了。

這種做法看起來有些極端，卻是十分實用的。如果你的程式需要讀取一個文件，並且在文件不存在時無法執行任何操作，那麼退出是十分合理、有效的選擇。在 Result 中還有一個快捷方法，叫做 unwrap：

#![allow(unused)]
fn main() {
let content = std::fs::read_to_string("test.txt").unwrap();
}

無須 panic

當然，退出程式並非處理錯誤的唯一辦法。

除 panic!之外，實作 return 也很簡單：

fn main() -> Result<(), Box<dyn std::error::Error>> {
let result = std::fs::read_to_string("test.txt");
let content = match result {
    Ok(content) => { content },
    Err(error) => { return Err(error.into()); }
};
Ok(())
}

然而，這改變了我們函數的回傳值型別。實際上，一直以來我們的範例都隱藏了一些東西：函數的簽名（或說回傳值型別）。在最後的含有 return 的範例中，它變得很重要了。下面是完整的範例：

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = std::fs::read_to_string("test.txt");
    let content = match result {
        Ok(content) => { content },
        Err(error) => { return Err(error.into()); }
    };
    println!("file content: {}", content);
    Ok(())
}

我們的回傳值型別是 Result！這也就是為什麼我們可以在 match 的第二個分支寫 return Err(error);。看到最下面的 Ok(()) 是什麼嗎？它是函數的預設回傳值，意思為「結果沒問題，沒有內容」。

提供內容

在 main 函數中使用 ? 來取得錯誤，可以正常工作，但它有一些不足之處。例如：若使用 std::fs::read_to_string("test.txt")? 時，test.txt 檔案不存在，你會得到以下錯誤訊息：

Error: Os { code: 2, kind: NotFound, message: "No such file or directory" }

在這裡你的程式碼裡沒有包含檔名，這會讓確認是哪個檔案 NotFound 變得很麻煩。但我們有許多種辦法可以改進它。

例如，我們可以建立一個自己的錯誤型別，然後使用它去產生自訂的錯誤訊息：

#[derive(Debug)]
struct CustomError(String);

fn main() -> Result<(), CustomError> {
    let path = "test.txt";
    let content = std::fs::read_to_string(path)
        .map_err(|err| CustomError(format!("Error reading `{}`: {}", path, err)))?;
    println!("file content: {}", content);
    Ok(())
}

現在，運行它將會得到我們剛才自訂的錯誤訊息：

Error: CustomError("Error reading `test.txt`: No such file or directory (os error 2)")

儘管不是很完美，但我們稍後可以輕鬆地為我們的型別除錯輸出。

這種模式實際上很常見。但它有一個問題：我們無法儲存原始錯誤，僅只能其輸出的字串來表示形式。常用的 anyhow 函式庫對此有一個巧妙的解決方案：類似於我們的 CustomError 型別，它的 Context 特徵可用於新增描述。此外，它還保留了原始錯誤，因此我們會得到一串( chain )錯誤訊息，同時指出根本原因。

讓我們先在 Cargo.toml 檔案中的 [dependencies] 欄位中新增上 anyhow = "1.0"。

完整的範例將如下所示：

use anyhow::{Context, Result};

fn main() -> Result<()> {
    let path = "test.txt";
    let content = std::fs::read_to_string(path)
        .with_context(|| format!("could not read file `{}`", path))?;
    println!("file content: {}", content);
    Ok(())
}

這將會輸出一個錯誤：

Error: could not read file `test.txt`

Caused by:
    No such file or directory (os error 2)

總結

你的程式碼現在看起來應該是這樣的：

use anyhow::{Context, Result};
use clap::Parser;

/// Search for a pattern in a file and display the lines that contain it.
#[derive(Parser)]
struct Cli {
    /// The pattern to look for
    pattern: String,
    /// The path to the file to read
    path: std::path::PathBuf,
}

fn main() -> Result<()> {
    let args = Cli::parse();

    let content = std::fs::read_to_string(&args.path)
        .with_context(|| format!("could not read file `{}`", args.path.display()))?;

    for line in content.lines() {
        if line.contains(&args.pattern) {
            println!("{}", line);
        }
    }

    Ok(())
}

輸出

使用 `println!`

您幾乎可以在終端機下使用 println! 巨集輸出所有您想輸出的東西。

這個巨集有一些非常驚人的功能，而且還有特殊的語法。它希望您撰寫一個字串文字作為第一個參數，包含將要填入的佔位符號再透過後面作為進一步參數的參數值。

範例:

#![allow(unused)]
fn main() {
let x = 42;
println!("My lucky number is {}.", x);
}

將會輸出

My lucky number is 42.

上面字串中的大括弧（ {} ）就是佔位符號中的一種。這是預設的佔位符號型別，它嘗試以人類可讀的方式來輸出給定的參數的值。對於數值和字串，這會很好用，但並不是所有的型別都可以。這就是為什麼還有一個 “除錯表示(debug representation)”，你可以使用這個佔位符號來呼叫它 {:?}。

範例

#![allow(unused)]
fn main() {
let xs = vec![1, 2, 3];
println!("The list is: {:?}", xs);
}

將會輸出

The list is: [1, 2, 3]

如果你想在偵錯和記錄中輸出自己建置的型別，大部分情況下你可以在型別定義上新增 #[derive(Debug)] 屬性。

輸出錯誤

輸出錯誤的部分應透過 stderr 完成，以便使用者和其它工具更方便的地將輸出通過管道傳輸到文件或更多的工具中。

在 Rust 中，使用 println! 和 eprintln!，前者對應 stdout 和後者對應 stderr。

#![allow(unused)]
fn main() {
println!("This is information");
eprintln!("This is an error! :(");
}

為了更方便了解我們的程式做了什麼，我們需要為它加入一些記錄檔的相關語句，這很簡單。但在長時間後，例如半年後再執行這個程式時，記錄檔就變得非常有用了。在某些方面來說，記錄的使用方法與 println 一樣類似，只是它可以指定訊息的重要性（層級）。通常可以使用的層級包括 error , warn, info , debug , 和 trace （ error 優先順序最高， trace 最低）。

只需這兩樣東西，你就可以為你的程式加入簡單的記錄功能： Log create（其中包含以記錄等級命名的巨集）和一個 轉接器，它會將記錄寫到有用的地方。記錄轉接器的使用是十分靈活的：例如，你可以不僅將記錄輸出至終端，同時也可寫進 syslog 或其它記錄伺服器。

因為我們現在最關心的是寫一個 CLI 程式，所以選取一個易於使用的轉接器 env_logger。它之所以叫 env 記錄器，因為你可以使用環境變數來指定程式中哪一部分需要記錄和記錄哪一個等級。它會在你的記錄資訊前加上時間戳記及所在模組資訊。由於函式庫也可以使用 log，你也可以輕鬆地配置它們的記錄輸出。

這是一個簡單的範例:

use log::{info, warn};

fn main() {
    env_logger::init();
    info!("starting up");
    warn!("oops, nothing implemented!");
}

如果你有 src/bin/output-log.rs 這個文件，在 Linux 和 MacOS 上，您可以像這樣執行它：

$ env RUST_LOG=info cargo run --bin output-log
    Finished dev [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/output-log`
[2018-11-30T20:25:52Z INFO  output_log] starting up
[2018-11-30T20:25:52Z WARN  output_log] oops, nothing implemented!

在 Windows PowerShell 中，您可以像這樣執行它：

$ $env:RUST_LOG="info"
$ cargo run --bin output-log
    Finished dev [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/output-log.exe`
[2018-11-30T20:25:52Z INFO  output_log] starting up
[2018-11-30T20:25:52Z WARN  output_log] oops, nothing implemented!

在 Windows CMD(命令提示字元) 中，您可以像這樣執行它：

$ set RUST_LOG=info
$ cargo run --bin output-log
    Finished dev [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/output-log.exe`
[2018-11-30T20:25:52Z INFO  output_log] starting up
[2018-11-30T20:25:52Z WARN  output_log] oops, nothing implemented!

RUST_LOG 是設定 log 的環境變數名稱，您可以使用它來設定記錄檔設定。 env_logger 還包含一個建置器，因此你可以以程式設計的方式調整這些設置，例如，預設顯示 info 等級的記錄。

有很多替代的記錄器，以及 log 的替代品或擴充功能。如果您知道您的應用程式將有很多需要記錄的內容，請確保檢查它們，並讓您的使用者的生活更輕鬆。

測試

經過數十年的軟體開發，人們發現了一個真理：未經測試的軟體很少能運作。（很多人甚至會說：“大多數經過測試的軟體也無法運行。”但我們都是樂觀主義者，對吧？）

因此，為了確保您的程式執行您期望的操作，測試它是明智的。

一種簡單的方法是撰寫 README 文件，它描述了你的程式應該做什麼。當您準備好發布新版本時，仔細閱讀 README 並確保行為仍然如預期。你可以讓它成為一個更嚴格的練習也寫下您的程式應如何對錯誤輸入做出反應。

這是另一個獨特的想法：在撰寫程式碼之前先寫下 README。

自動化測試

現在，這一切都很好，但是手動完成這些工作？但這會耗費很多時間。與此同時，很多人都喜歡讓電腦代勞這些工作。讓我們來談談如何實現這些測試的自動化。

Rust 有內建的測試框架，讓我們試著寫出第一個測試吧：

fn answer() -> i32 {
  42
}

#[test]
fn check_answer_validity() {
    assert_eq!(answer(), 42);
}

您可以在幾乎任何文件中加入這段程式碼並且 cargo test 會找到並運行它。這裡的關鍵字是 #[test] 屬性。它允許建構系統發現這些函數並將其作為測試運行、驗證它們是否不會導致恐慌(panic)。

供讀者練習: 讓這個測試能正常運作。

最終的輸出結果應該如下所示：

running 1 test
test check_answer_validity ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

既然現在我們已經了解瞭 如何(how) 撰寫測試，那麼還需要弄清楚要測試 什麼(what) 。正如你所看到的，為函數撰寫斷言相當容易。但 CLI 應用程式通常不只一個函數！更糟糕的是，它經常要處理用戶輸入、讀取檔案和寫入輸出。

撰寫可測試的程式碼

有兩種互補的功能測試方法：測試小單元，根據這些小單元建立完整的應用程式、這些稱為 “單元測試(unit tests)”。還有一種是 “從外部(from the outside)” 測試最終應用程序，稱為 “黑盒測試(black box tests)” 或 “整合測試(integration tests)”。

讓我們從第一種開始。

To figure out what we should test, let’s see what our program features are. Mainly, grrs is supposed to print out the lines that match a given pattern. So, let’s write unit tests for exactly this: We want to ensure that our most important piece of logic works, and we want to do it in a way that is not dependent on any of the setup code we have around it (that deals with CLI arguments, for example).

Going back to our first implementation of grrs, we added this block of code to the main function:

// ...
for line in content.lines() {
    if line.contains(&args.pattern) {
        println!("{}", line);
    }
}

Sadly, this is not very easy to test. First of all, it’s in the main function, so we can’t easily call it. This is easily fixed by moving this piece of code into a function:

#![allow(unused)]
fn main() {
fn find_matches(content: &str, pattern: &str) {
    for line in content.lines() {
        if line.contains(pattern) {
            println!("{}", line);
        }
    }
}
}

Now we can call this function in our test, and see what its output is:

#[test]
fn find_a_match() {
    find_matches("lorem ipsum\ndolor sit amet", "lorem");
    assert_eq!( // uhhhh

Or… can we? Right now, find_matches prints directly to stdout, i.e., the terminal. We can’t easily capture this in a test! This is a problem that often comes up when writing tests after the implementation: We have written a function that is firmly integrated in the context it is used in.

Alright, how can we make this testable? We’ll need to capture the output somehow. Rust’s standard library has some neat abstractions for dealing with I/O (input/output) and we’ll make use of one called std::io::Write. This is a trait that abstracts over things we can write to, which includes strings but also stdout.

If this is the first time you’ve heard “trait” in the context of Rust, you are in for a treat. Traits are one of the most powerful features of Rust. You can think of them like interfaces in Java, or type classes in Haskell (whatever you are more familiar with). They allow you to abstract over behavior that can be shared by different types. Code that uses traits can express ideas in very generic and flexible ways. This means it can also get difficult to read, though. Don’t let that intimidate you: Even people who have used Rust for years don’t always get what generic code does immediately. In that case, it helps to think of concrete uses. For example, in our case, the behavior that we abstract over is “write to it”. Examples for the types that implement (“impl”) it include: The terminal’s standard output, files, a buffer in memory, or TCP network connections. (Scroll down in the documentation for std::io::Write to see a list of “Implementors”.)

With that knowledge, let’s change our function to accept a third parameter. It should be of any type that implements Write. This way, we can then supply a simple string in our tests and make assertions on it. Here is how we can write this version of find_matches:

fn find_matches(content: &str, pattern: &str, mut writer: impl std::io::Write) {
    for line in content.lines() {
        if line.contains(pattern) {
            writeln!(writer, "{}", line);
        }
    }
}

The new parameter is mut writer, i.e., a mutable thing we call “writer”. Its type is impl std::io::Write, which you can read as “a placeholder for any type that implements the Write trait”. Also 筆記 how we replaced the println!(…) we used earlier with writeln!(writer, …). println! works the same as writeln! but always uses standard output.

Now we can test for the output:

#[test]
fn find_a_match() {
    let mut result = Vec::new();
    find_matches("lorem ipsum\ndolor sit amet", "lorem", &mut result);
    assert_eq!(result, b"lorem ipsum\n");
}

To now use this in our application code, we have to change the call to find_matches in main by adding &mut std::io::stdout() as the third parameter. Here’s an example of a main function that builds on what we’ve seen in the previous chapters and uses our extracted find_matches function:

fn main() -> Result<()> {
    let args = Cli::parse();
    let content = std::fs::read_to_string(&args.path)
        .with_context(|| format!("could not read file `{}`", args.path.display()))?;

    find_matches(&content, &args.pattern, &mut std::io::stdout());

    Ok(())
}

We’ve just seen how to make this piece of code easily testable. We have

identified one of the core pieces of our application,
put it into its own function,
and made it more flexible.

Even though the goal was to make it testable, the result we ended up with is actually a very idiomatic and reusable piece of Rust code. That’s awesome!

Splitting your code into library and binary targets

We can do one more thing here. So far we’ve put everything we wrote into the src/main.rs file. This means our current project produces a single binary. But we can also make our code available as a library, like this:

Put the find_matches function into a new src/lib.rs.
Add a pub in front of the fn (so it’s pub fn find_matches) to make it something that users of our library can access.
Remove find_matches from src/main.rs.
In the fn main, prepend the call to find_matches with grrs::, so it’s now grrs::find_matches(…). This means it uses the function from the library we just wrote!

The way Rust deals with projects is quite flexible and it’s a good idea to think about what to put into the library part of your crate early on. You can for example think about writing a library for your application-specific logic first and then use it in your CLI just like any other library. Or, if your project has multiple binaries, you can put the common functionality into the library part of that crate.

Testing CLI applications by running them

Thus far, we’ve gone out of our way to test the business logic of our application, which turned out to be the find_matches function. This is very valuable and is a great first step towards a well-tested code base. (Usually, these kinds of tests are called “unit tests”.)

There is a lot of code we aren’t testing, though: Everything that we wrote to deal with the outside world! Imagine you wrote the main function, but accidentally left in a hard-coded string instead of using the argument of the user-supplied path. We should write tests for that, too! (This level of testing is often called “integration testing”, or “system testing”.)

At its core, we are still writing functions and annotating them with #[test]. It’s just a matter of what we do inside these functions. For example, we’ll want to use the main binary of our project, and run it like a regular program. We will also put these tests into a new file in a new directory: tests/cli.rs.

To recall, grrs is a small tool that searches for a string in a file. We have previously tested that we can find a match. Let’s think about what other functionality we can test.

Here is what I came up with.

What happens when the file doesn’t exist?
What is the output when there is no match?
Does our program exit with an error when we forget one (or both) arguments?

These are all valid test cases. Additionally, we should also include one test case for the “happy path”, i.e., we found at least one match and we print it.

To make these kinds of tests easier, we’re going to use the assert_cmd crate. It has a bunch of neat helpers that allow us to run our main binary and see how it behaves. Further, we’ll also add the predicates crate which helps us write assertions that assert_cmd can test against (and that have great error messages). We’ll add those dependencies not to the main list, but to a “dev dependencies” section in our Cargo.toml. They are only required when developing the crate, not when using it.

[dev-dependencies]
assert_cmd = "2.0.14"
predicates = "3.1.0"

This sounds like a lot of setup. Nevertheless – let’s dive right in and create our tests/cli.rs file:

use assert_cmd::prelude::*; // Add methods on commands
use predicates::prelude::*; // Used for writing assertions
use std::process::Command; // Run programs

#[test]
fn file_doesnt_exist() -> Result<(), Box<dyn std::error::Error>> {
    let mut cmd = Command::cargo_bin("grrs")?;

    cmd.arg("foobar").arg("test/file/doesnt/exist");
    cmd.assert()
        .failure()
        .stderr(predicate::str::contains("could not read file"));

    Ok(())
}

You can run this test with cargo test, just like the tests we wrote above. It might take a little longer the first time, as Command::cargo_bin("grrs") needs to compile your main binary.

Generating test files

The test we’ve just seen only checks that our program writes an error message when the input file doesn’t exist. That’s an important test to have, but maybe not the most important one: Let’s now test that we will actually print the matches we found in a file!

We’ll need to have a file whose content we know, so that we can know what our program should return and check this expectation in our code. One idea might be to add a file to the project with custom content and use that in our tests. Another would be to create temporary files in our tests. For this tutorial, we’ll have a look at the latter approach. Mainly, because it is more flexible and will also work in other cases; for example, when you are testing programs that change the files.

To create these temporary files, we’ll be using the assert_fs crate. Let’s add it to the dev-dependencies in our Cargo.toml:

assert_fs = "1.1.1"

Here is a new test case (that you can write below the other one) that first creates a temp file (a “named” one so we can get its path), fills it with some text, and then runs our program to see if we get the correct output. When the file goes out of scope (at the end of the function), the actual temporary file will automatically get deleted.

use assert_fs::prelude::*;

#[test]
fn find_content_in_file() -> Result<(), Box<dyn std::error::Error>> {
    let file = assert_fs::NamedTempFile::new("sample.txt")?;
    file.write_str("A test\nActual content\nMore content\nAnother test")?;

    let mut cmd = Command::cargo_bin("grrs")?;
    cmd.arg("test").arg(file.path());
    cmd.assert()
        .success()
        .stdout(predicate::str::contains("A test\nAnother test"));

    Ok(())
}

What to test?

While it can certainly be fun to write integration tests, it will also take some time to write them, as well as to update them when your application’s behavior changes. To make sure you use your time wisely, you should ask yourself what you should test.

In general it’s a good idea to write integration tests for all types of behavior that a user can observe. That means that you don’t need to cover all edge cases: It usually suffices to have examples for the different types and rely on unit tests to cover the edge cases.

It is also a good idea not to focus your tests on things you can’t actively control. It would be a bad idea to test the exact layout of --help as it is generated for you. Instead, you might just want to check that certain elements are present.

Depending on the nature of your program, you can also try to add more testing techniques. For example, if you have extracted parts of your program and find yourself writing a lot of example cases as unit tests while trying to come up with all the edge cases, you should look into proptest. If you have a program which consumes arbitrary files and parses them, try to write a fuzzer to find bugs in edge cases.

Packaging and distributing a Rust tool

If you feel confident that your program is ready for other people to use, it is time to package and release it!

There are a few approaches, and we’ll look at three of them from “quickest to set up” to “most convenient for users”.

Quickest: `cargo publish`

The easiest way to publish your app is with cargo. Do you remember how we added external dependencies to our project? Cargo downloaded them from its default “crate registry”, crates.io. With cargo publish, you too can publish crates to crates.io. And this works for all crates, including those with binary targets.

Publishing a crate to crates.io is pretty straightforward: If you haven’t already, create an account on crates.io. Currently, this is done via authorizing you on GitHub, so you’ll need to have a GitHub account (and be logged in there). Next, you log in using cargo on your local machine. For that, go to your crates.io account page, create a new token, and then run cargo login <your-new-token>. You only need to do this once per computer. You can learn more about this in cargo’s publishing guide.

Now that cargo as well as crates.io know you, you are ready to publish crates. Before you hastily go ahead and publish a new crate (version), it’s a good idea to open your Cargo.toml once more and make sure you added the necessary metadata. You can find all the possible fields you can set in the documentation for cargo’s manifest format. Here’s a quick overview of some common entries:

[package]
name = "grrs"
version = "0.1.0"
authors = ["Your Name <your@email.com>"]
license = "MIT OR Apache-2.0"
description = "A tool to search files"
readme = "README.md"
homepage = "https://github.com/you/grrs"
repository = "https://github.com/you/grrs"
keywords = ["cli", "search", "demo"]
categories = ["command-line-utilities"]

How to install a binary from crates.io

We’ve seen how to publish a crate to crates.io, and you might be wondering how to install it. In contrast to libraries, which cargo will download and compile for you when you run cargo build (or a similar command), you’ll need to tell it to explicitly install binaries.

This is done using cargo install <crate-name>. It will by default download the crate, compile all the binary targets it contains (in “release” mode, so it might take a while) and copy them into the ~/.cargo/bin/ directory. (Make sure that your shell knows to look there for binaries!)

It’s also possible to install crates from git repositories, only install specific binaries of a crate, and specify an alternative directory to install them to. Have a look at cargo install --help for details.

When to use it

cargo install is a simple way to install a binary crate. It’s very convenient for Rust developers to use, but has some significant downsides: Since it will always compile your source from scratch, users of your tool will need to have Rust, cargo, and all other system dependencies your project requires to be installed on their machine. Compiling large Rust codebases can also take some time.

It’s best to use this for distributing tools that are targeted at other Rust developers. For example: A lot of cargo subcommands like cargo-tree or cargo-outdated can be installed with it.

Distributing binaries

Rust is a language that compiles to native code and by default statically links all dependencies. When you run cargo build on your project that contains a binary called grrs, you’ll end up with a binary file called grrs. Try it out: Using cargo build, it’ll be target/debug/grrs, and when you run cargo build --release, it’ll be target/release/grrs. Unless you use crates that explicitly need external libraries to be installed on the target system (like using the system’s version of OpenSSL), this binary will only depend on common system libraries. That means, you take that one file, send it to people running the same operating system as you, and they’ll be able to run it.

This is already very powerful! It works around two of the downsides we just saw for cargo install: There is no need to have Rust installed on the user’s machine, and instead of it taking a minute to compile, they can instantly run the binary.

So, as we’ve seen, cargo build already builds binaries for us. The only issue is, those are not guaranteed to work on all platforms. If you run cargo build on your Windows machine, you won’t get a binary that works on a Mac by default. Is there a way to generate these binaries for all the interesting platforms automatically?

Building binary releases on CI

If your tool is open sourced and hosted on GitHub, it’s quite easy to set up a free CI (continuous integration) service like Travis CI. (There are other services that also work on other platforms, but Travis is very popular.) This basically runs setup commands in a virtual machine each time you push changes to your repository. What those commands are, and the types of machines they run on, is configurable. For example: A good idea is to run cargo test on a machine with Rust and some common build tools installed. If this fails, you know there are issues in the most recent changes.

We can also use this to build binaries and upload them to GitHub! Indeed, if we run cargo build --release and upload the binary somewhere, we should be all set, right? Not quite. We still need to make sure the binaries we build are compatible with as many systems as possible. For example, on Linux we can compile not for the current system, but instead for the x86_64-unknown-linux-musl target, to not depend on default system libraries. On macOS, we can set MACOSX_DEPLOYMENT_TARGET to 10.7 to only depend on system features present in versions 10.7 and older.

You can see one example of building binaries using this approach here for Linux and macOS and here for Windows (using AppVeyor).

Another way is to use pre-built (Docker) images that contain all the tools we need to build binaries. This allows us to easily target more exotic platforms, too. The trust project contains scripts that you can include in your project as well as instructions on how to set this up. It also includes support for Windows using AppVeyor.

If you’d rather set this up locally and generate the release files on your own machine, still have a look at trust. It uses cross internally, which works similar to cargo but forwards commands to a cargo process inside a Docker container. The definitions of the images are also available in cross’ repository.

How to install these binaries

You point your users to your release page that might look something like this one, and they can download the artifacts we’ve just created. The release artifacts we’ve just generated are nothing special: At the end, they are just archive files that contain our binaries! This means that users of your tool can download them with their browser, extract them (often happens automatically), and copy the binaries to a place they like.

This does require some experience with manually “installing” programs, so you want to add a section to your README file on how to install this program.

When to use it

Having binary releases is a good idea in general, there’s hardly any downside to it. It does not solve the problem of users having to manually install and update your tools, but they can quickly get the latest releases version without the need to install Rust.

What to package in addition to your binaries

Right now, when a user downloads our release builds, they will get a .tar.gz file that only contains binary files. So, in our example project, they will just get a single grrs file they can run. But there are some more files we already have in our repository that they might want to have. The README file that tells them how to use this tool, and the license file(s), for example. Since we already have them, they are easy to add.

There are some more interesting files that make sense especially for command-line tools, though: How about we also ship a man page in addition to that README file, and config files that add completions of the possible flags to your shell? You can write these by hand, but clap, the argument parsing library we use (which clap builds upon) has a way to generate all these files for us. See this in-depth chapter for more details.

Getting your app into package repositories

Both approaches we’ve seen so far are not how you typically install software on your machine. Especially command-line tools you install using global package managers on most operating systems. The advantages for users are quite obvious: There is no need to think about how to install your program, if it can be installed the same way as they install the other tools. These package managers also allow users to update their programs when a new version is available.

Sadly, supporting different systems means you’ll have to look at how these different systems work. For some, it might be as easy as adding a file to your repository (e.g. adding a Formula file like this for macOS’s brew), but for others you’ll often need to send in patches yourself and add your tool to their repositories. There are helpful tools like cargo-bundle, cargo-deb, and cargo-aur, but describing how they work and how to correctly package your tool for those different systems is beyond the scope of this chapter.

Instead, let’s have a look at a tool that is written in Rust and that is available in many different package managers.

An example: ripgrep

ripgrep is an alternative to grep/ack/ag and is written in Rust. It’s quite successful and is packaged for many operating systems: Just look at the “Installation” section of its README!

Note that it lists a few different options how you can install it: It starts with a link to the GitHub releases which contain the binaries so you can download them directly; then it lists how to install it using a bunch of different package managers; finally, you can also install it using cargo install.

This seems like a very good idea: Don’t pick and choose one of the approaches presented here, but start with cargo install, add binary releases, and finally start distributing your tool using system package managers.

深入探究主題

該章節會涵蓋更多較為進階的小細節,但您在撰寫命令列應用程式時可能會在乎這一點。

訊號處理

Processes like command line applications need to react to signals sent by the operating system. The most common example is probably Ctrl+C, the signal that typically tells a process to terminate. To handle signals in Rust programs you need to consider how you can receive these signals as well as how you can react to them.

行程就像命令列應用程式一樣需要對作業系統發送的訊號做出反應。

最常見的例子可能就是 Ctrl+C，通常告訴行程終止的訊號。

要在 Rust 程式中處理信號你需要考慮如何接收這些信號以及如何對它們作出反應。

Differences between operating systems

On Unix systems (like Linux, macOS, and FreeBSD) a process can receive signals. It can either react to them in a default (OS-provided) way, catch the signal and handle them in a program-defined way, or ignore the signal entirely.

Windows does not have signals. You can use Console Handlers to define callbacks that get executed when an event occurs. There is also structured exception handling which handles all the various types of system exceptions such as division by zero, invalid access exceptions, stack overflow, and so on

First off: Handling Ctrl+C

The ctrlc crate does just what the name suggests: It allows you to react to the user pressing Ctrl+C, in a cross-platform way. The main way to use the crate is this:

use std::{thread, time::Duration};

fn main() {
    ctrlc::set_handler(move || {
        println!("received Ctrl+C!");
    })
    .expect("Error setting Ctrl-C handler");

    // Following code does the actual work, and can be interrupted by pressing
    // Ctrl-C. As an example: Let's wait a few seconds.
    thread::sleep(Duration::from_secs(2));
}

This is, of course, not that helpful: It only prints a message but otherwise doesn’t stop the program.

In a real-world program, it’s a good idea to instead set a variable in the signal handler that you then check in various places in your program. For example, you can set an Arc<AtomicBool> (a boolean shareable between threads) in your signal handler, and in hot loops, or when waiting for a thread, you periodically check its value and break when it becomes true.

Handling other types of signals

The ctrlc crate only handles Ctrl+C, or, what on Unix systems would be called SIGINT (the “interrupt” signal). To react to more Unix signals, you should have a look at signal-hook. Its design is described in this blog post, and it is currently the library with the widest community support.

Here’s a simple example:

use signal_hook::{consts::SIGINT, iterator::Signals};
use std::{error::Error, thread, time::Duration};

fn main() -> Result<(), Box<dyn Error>> {
    let mut signals = Signals::new(&[SIGINT])?;

    thread::spawn(move || {
        for sig in signals.forever() {
            println!("Received signal {:?}", sig);
        }
    });

    // Following code does the actual work, and can be interrupted by pressing
    // Ctrl-C. As an example: Let's wait a few seconds.
    thread::sleep(Duration::from_secs(2));

    Ok(())
}

Using channels

Instead of setting a variable and having other parts of the program check it, you can use channels: You create a channel into which the signal handler emits a value whenever the signal is received. In your application code you use this and other channels as synchronization points between threads. Using crossbeam-channel it would look something like this:

use std::time::Duration;
use crossbeam_channel::{bounded, tick, Receiver, select};
use anyhow::Result;

fn ctrl_channel() -> Result<Receiver<()>, ctrlc::Error> {
    let (sender, receiver) = bounded(100);
    ctrlc::set_handler(move || {
        let _ = sender.send(());
    })?;

    Ok(receiver)
}

fn main() -> Result<()> {
    let ctrl_c_events = ctrl_channel()?;
    let ticks = tick(Duration::from_secs(1));

    loop {
        select! {
            recv(ticks) -> _ => {
                println!("working!");
            }
            recv(ctrl_c_events) -> _ => {
                println!();
                println!("Goodbye!");
                break;
            }
        }
    }

    Ok(())
}

Using futures and streams

If you are using tokio, you are most likely already writing your application with asynchronous patterns and an event-driven design. Instead of using crossbeam’s channels directly, you can enable signal-hook’s tokio-support feature. This allows you to call .into_async() on signal-hook’s Signals types to get a new type that implements futures::Stream.

What to do when you receive another Ctrl+C while you’re handling the first Ctrl+C

Most users will press Ctrl+C, and then give your program a few seconds to exit, or tell them what’s going on. If that doesn’t happen, they will press Ctrl+C again. The typical behavior is to have the application quit immediately.

使用配置檔

處理配置可能會很煩人尤其是在支援多個作業系統的情況下都有自己的短期和長期文件。

對此有許多解決方案，有些比其他的層次更低。

最容易使用的crate是 confy。它會詢問您的應用程式名稱並請您指定配置佈局 結構(struct) （即 序列化(Serialize) 、 反序列化(Deserialize) ）。然後剩下的它將解決其他問題！

#[derive(Debug, Serialize, Deserialize)]
struct MyConfig {
    name: String,
    comfy: bool,
    foo: i64,
}

fn main() -> Result<(), io::Error> {
    let cfg: MyConfig = confy::load("my_app")?;
    println!("{:#?}", cfg);
    Ok(())
}

這非常容易使用為此，您當然要放棄可設定性。但如果你想要一個簡單的配置，這個crate可能適合你！

配置環境

退出狀態碼

程式並不總是成功的。

當錯誤發生時，你應該確保正確地發出必要的資訊，除了告訴使用者錯誤訊息，在大多數系統中，當行程退出時也會發出一個退出代碼 ( 一個介於 0 和 255 之間的整數，與大多數平台相容）。

您應盡量根據程式的狀態去制定狀態碼。例如：在程式成功運行的理想情況下，應該以 0 退出。

所以，要如何去做呢？ BSD 生態系統為其退出碼做了一個通用的定義（你可以在這裡找到它們）。 Rust 的 exitcode 函式庫也提供了一樣的程式碼，而且你可在你的程式中使用。請參閱其 API 文件以了解其用法。

當你在你的 Cargo.toml 中加入 exitcode 依賴後，你可以這樣使用：

fn main() {
    // ...actual work...
    match result {
        Ok(_) => {
            println!("Done!");
            std::process::exit(exitcode::OK);
        }
        Err(CustomError::CantReadConfig(e)) => {
            eprintln!("Error: {}", e);
            std::process::exit(exitcode::CONFIG);
        }
        Err(e) => {
            eprintln!("Error: {}", e);
            std::process::exit(exitcode::DATAERR);
        }
    }
}

與人互動

Make sure to read the chapter on CLI output in the tutorial first.

It covers how to write output to the terminal, while this chapter will talk about what to output.

When everything is fine

It is useful to report on the application’s progress even when everything is fine. Try to be informative and concise in these messages. Don’t use overly technical terms in the logs. Remember: the application is not crashing so there’s no reason for users to look up errors.

Most importantly, be consistent in the style of communication. Use the same prefixes and sentence structure to make the logs easily skimmable.

Try to let your application output tell a story about what it’s doing and how it impacts the user. This can involve showing a timeline of steps involved or even a progress bar and indicator for long-running actions. The user should at no point get the feeling that the application is doing something mysterious that they cannot follow.

When it’s hard to tell what’s going on

When communicating non-nominal state it’s important to be consistent. A heavily logging application that doesn’t follow strict logging levels provides the same amount, or even less information than a non-logging application.

Because of this, it’s important to define the severity of events and messages that are related to it; then use consistent log levels for them. This way users can select the amount of logging themselves via --verbose flags or environment variables (like RUST_LOG).

The commonly used log crate defines the following levels (ordered by increasing severity):

trace
debug
info
warning
error

It’s a good idea to think of info as the default log level. Use it for, well, informative output. (Some applications that lean towards a more quiet output style might only show warnings and errors by default.)

Additionally, it’s always a good idea to use similar prefixes and sentence structure across log messages, making it easy to use a tool like grep to filter for them. A message should provide enough context by itself to be useful in a filtered log while not being too verbose at the same time.

Example log statements

error: could not find `Cargo.toml` in `/home/you/project/`

=> Downloading repository index
=> Downloading packages...

The following log output is taken from wasm-pack:

 [1/7] Adding WASM target...
 [2/7] Compiling to WASM...
 [3/7] Creating a pkg directory...
 [4/7] Writing a package.json...
 > [WARN]: Field `description` is missing from Cargo.toml. It is not necessary, but recommended
 > [WARN]: Field `repository` is missing from Cargo.toml. It is not necessary, but recommended
 > [WARN]: Field `license` is missing from Cargo.toml. It is not necessary, but recommended
 [5/7] Copying over your README...
 > [WARN]: origin crate has no README
 [6/7] Installing WASM-bindgen...
 > [INFO]: wasm-bindgen already installed
 [7/7] Running WASM-bindgen...
 Done in 1 second

When panicking

One aspect often forgotten is that your program also outputs something when it crashes. In Rust, “crashes” are most often “panics” (i.e., “controlled crashing” in contrast to “the operating system killed the process”). By default, when a panic occurs, a “panic handler” will print some information to the console.

For example, if you create a new binary project with cargo new --bin foo and replace the content of fn main with panic!("Hello World"), you get this when you run your program:

thread 'main' panicked at 'Hello, world!', src/main.rs:2:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

This is useful information to you, the developer. (Surprise: the program crashed because of line 2 in your main.rs file). But for a user who doesn’t even have access to the source code, this is not very valuable. In fact, it most likely is just confusing. That’s why it’s a good idea to add a custom panic handler, that provides a bit more end-user focused output.

One library that does just that is called human-panic. To add it to your CLI project, you import it and call the setup_panic!() macro at the beginning of your main function:

use human_panic::setup_panic;

fn main() {
   setup_panic!();

   panic!("Hello world")
}

This will now show a very friendly message, and tells the user what they can do:

Well, this is embarrassing.

foo had a problem and crashed. To help us diagnose the problem you can send us a crash report.

We have generated a report file at "/var/folders/n3/dkk459k908lcmkzwcmq0tcv00000gn/T/report-738e1bec-5585-47a4-8158-f1f7227f0168.toml". Submit an issue or email with the subject of "foo Crash Report" and include the report as an attachment.

- Authors: Your Name <your.name@example.com>

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!

與機器互動

The power of command-line tools really comes to shine when you are able to combine them. This is not a new idea: In fact, this is a sentence from the Unix philosophy:

Expect the output of every program to become the input to another, as yet unknown, program.

If our programs fulfill this expectation, our users will be happy. To make sure this works well, we should provide not just pretty output for humans, but also a version tailored to what other programs need. Let’s see how we can do this.

Who’s reading this?

The first question to ask is: Is our output for a human in front of a colorful terminal, or for another program? To answer this, we can use a crate like is-terminal:

use is_terminal::IsTerminal as _;

if std::io::stdout().is_terminal() {
    println!("I'm a terminal");
} else {
    println!("I'm not");
}

Depending on who will read our output, we can then add extra information. Humans tend to like colors, for example, if you run ls in a random Rust project, you might see something like this:

$ ls
CODE_OF_CONDUCT.md   LICENSE-APACHE       examples
CONTRIBUTING.md      LICENSE-MIT          proptest-regressions
Cargo.lock           README.md            src
Cargo.toml           convey_derive        target

Because this style is made for humans, in most configurations it’ll even print some of the names (like src) in color to show that they are directories. If you instead pipe this to a file, or a program like cat, ls will adapt its output. Instead of using columns that fit my terminal window it will print every entry on its own line. It will also not emit any colors.

$ ls | cat
CODE_OF_CONDUCT.md
CONTRIBUTING.md
Cargo.lock
Cargo.toml
LICENSE-APACHE
LICENSE-MIT
README.md
convey_derive
examples
proptest-regressions
src
target

Easy output formats for machines

Historically, the only type of output command-line tools produced were strings. This is usually fine for people in front of terminals, who are able to read text and reason about its meaning. Other programs usually don’t have that ability, though: The only way for them to understand the output of a tool like ls is if the author of the program included a parser that happens to work for whatever ls outputs.

This often means that output was limited to what is easy to parse. Formats like TSV (tab-separated values), where each record is on its own line, and each line contains tab-separated content, are very popular. These simple formats based on lines of text allow tools like grep to be used on the output of tools like ls. | grep Cargo doesn’t care if your lines are from ls or file, it will just filter line by line.

The downside of this is that you can’t use an easy grep invocation to filter all the directories that ls gave you. For that, each directory item would need to carry additional data.

JSON output for machines

Tab-separated values is a simple way to output structured data but it requires the other program to know which fields to expect (and in which order) and it’s difficult to output messages of different types. For example, let’s say our program wanted to message the consumer that it is currently waiting for a download, and afterwards output a message describing the data it got. Those are very different kinds of messages and trying to unify them in a TSV output would require us to invent a way to differentiate them. Same when we wanted to print a message that contains two lists of items of varying lengths.

Still, it’s a good idea to choose a format that is easily parsable in most programming languages/environments. Thus, over the last years a lot of applications gained the ability to output their data in JSON. It’s simple enough that parsers exist in practically every language yet powerful enough to be useful in a lot of cases. While its a text format that can be read by humans, a lot of people have also worked on implementations that are very fast at parsing JSON data and serializing data to JSON.

In the description above, we’ve talked about “messages” being written by our program. This is a good way of thinking about the output: Your program doesn’t necessarily only output one blob of data but may in fact emit a lot of different information while it is running. One easy way to support this approach when outputting JSON is to write one JSON document per message and to put each JSON document on new line (sometimes called Line-delimited JSON). This can make implementations as simple as using a regular println!.

Here’s a simple example, using the json! macro from serde_json to quickly write valid JSON in your Rust source code:

use clap::Parser;
use serde_json::json;

/// Search for a pattern in a file and display the lines that contain it.
#[derive(Parser)]
struct Cli {
    /// Output JSON instead of human readable messages
    #[arg(long = "json")]
    json: bool,
}

fn main() {
    let args = Cli::parse();
    if args.json {
        println!(
            "{}",
            json!({
                "type": "message",
                "content": "Hello world",
            })
        );
    } else {
        println!("Hello world");
    }
}

And here is the output:

$ cargo run -q
Hello world
$ cargo run -q -- --json
{"content":"Hello world","type":"message"}

(Running cargo with -q suppresses its usual output. The arguments after -- are passed to our program.)

Practical example: ripgrep

ripgrep is a replacement for grep or ag, written in Rust. By default it will produce output like this:

$ rg default
src/lib.rs
37:    Output::default()

src/components/span.rs
6:    Span::default()

But given --json it will print:

$ rg default --json
{"type":"begin","data":{"path":{"text":"src/lib.rs"}}}
{"type":"match","data":{"path":{"text":"src/lib.rs"},"lines":{"text":"    Output::default()\n"},"line_number":37,"absolute_offset":761,"submatches":[{"match":{"text":"default"},"start":12,"end":19}]}}
{"type":"end","data":{"path":{"text":"src/lib.rs"},"binary_offset":null,"stats":{"elapsed":{"secs":0,"nanos":137622,"human":"0.000138s"},"searches":1,"searches_with_match":1,"bytes_searched":6064,"bytes_printed":256,"matched_lines":1,"matches":1}}}
{"type":"begin","data":{"path":{"text":"src/components/span.rs"}}}
{"type":"match","data":{"path":{"text":"src/components/span.rs"},"lines":{"text":"    Span::default()\n"},"line_number":6,"absolute_offset":117,"submatches":[{"match":{"text":"default"},"start":10,"end":17}]}}
{"type":"end","data":{"path":{"text":"src/components/span.rs"},"binary_offset":null,"stats":{"elapsed":{"secs":0,"nanos":22025,"human":"0.000022s"},"searches":1,"searches_with_match":1,"bytes_searched":5221,"bytes_printed":277,"matched_lines":1,"matches":1}}}
{"data":{"elapsed_total":{"human":"0.006995s","nanos":6994920,"secs":0},"stats":{"bytes_printed":533,"bytes_searched":11285,"elapsed":{"human":"0.000160s","nanos":159647,"secs":0},"matched_lines":2,"matches":2,"searches":2,"searches_with_match":2}},"type":"summary"}

As you can see, each JSON document is an object (map) containing a type field. This would allow us to write a simple frontend for rg that reads these documents as they come in and show the matches (as well the files they are in) even while ripgrep is still searching.

How to deal with input piped into us

Let’s say we have a program that reads the number of words in a file:

use clap::Parser;
use std::path::PathBuf;

/// Count the number of lines in a file
#[derive(Parser)]
#[command(arg_required_else_help = true)]
struct Cli {
    /// The path to the file to read
    file: PathBuf,
}

fn main() {
    let args = Cli::parse();
    let mut word_count = 0;
    let file = args.file;

    for line in std::fs::read_to_string(&file).unwrap().lines() {
        word_count += line.split(' ').count();
    }

    println!("Words in {}: {}", file.to_str().unwrap(), word_count)
}

It takes the path to a file, reads it line by line, and counts the number of words separated by a space.

When you run it, it outputs the total words in the file:

$ cargo run README.md
Words in README.md: 47

But what if we wanted to count the number of words piped into the program? Rust programs can read data passed in via stdin with the Stdin struct which you can obtain via the stdin function from the standard library. Similar to reading the lines of a file, it can read the lines from stdin.

Here’s a program that counts the words of what’s piped in via stdin

use clap::{CommandFactory, Parser};
use is_terminal::IsTerminal as _;
use std::{
    fs::File,
    io::{stdin, BufRead, BufReader},
    path::PathBuf,
};

/// Count the number of lines in a file or stdin
#[derive(Parser)]
#[command(arg_required_else_help = true)]
struct Cli {
    /// The path to the file to read, use - to read from stdin (must not be a tty)
    file: PathBuf,
}

fn main() {
    let args = Cli::parse();

    let word_count;
    let mut file = args.file;

    if file == PathBuf::from("-") {
        if stdin().is_terminal() {
            Cli::command().print_help().unwrap();
            ::std::process::exit(2);
        }

        file = PathBuf::from("<stdin>");
        word_count = words_in_buf_reader(BufReader::new(stdin().lock()));
    } else {
        word_count = words_in_buf_reader(BufReader::new(File::open(&file).unwrap()));
    }

    println!("Words from {}: {}", file.to_string_lossy(), word_count)
}

fn words_in_buf_reader<R: BufRead>(buf_reader: R) -> usize {
    let mut count = 0;
    for line in buf_reader.lines() {
        count += line.unwrap().split(' ').count()
    }
    count
}

If you run that program with text piped in, with - representing the intent to read from stdin, it’ll output the word count:

$ echo "hi there friend" | cargo run -- -
Words from stdin: 3

It requires that stdin is not interactive because we’re expecting input that’s piped through to the program, not text that’s typed in at runtime. If stdin is a tty, it outputs the help docs so that it’s clear why it doesn’t work.

為你的 CLI 程式產生文件

CLI 程式的文件通常會包括指令中的 --help 部分和一個手冊（man）頁面。

兩者都可以自動產生當使用 clap 時，會透過 clap_mangen crate。

#[derive(Parser)]
pub struct Head {
    /// file to load
    pub file: PathBuf,
    /// how many lines to print
    #[arg(short = "n", default_value = "5")]
    pub count: usize,
}

其次，您需要使用 build.rs在編譯時, 請根據您的應用程式在程式碼中的定義而產生手冊文件。

在這裡你要留意幾件事（例如如何打包你的程式），但現在我們只是簡單地把 man 檔案放到我們的 src 同等級目錄。

use clap::CommandFactory;

#[path="src/cli.rs"]
mod cli;

fn main() -> std::io::Result<()> {
    let out_dir = std::path::PathBuf::from(std::env::var_os("OUT_DIR").ok_or_else(|| std::io::ErrorKind::NotFound)?);
    let cmd = cli::Head::command();

    let man = clap_mangen::Man::new(cmd);
    let mut buffer: Vec<u8> = Default::default();
    man.render(&mut buffer)?;

    std::fs::write(out_dir.join("head.1"), buffer)?;

    Ok(())
}

現在你在編譯你的程式時，將會在你的專案目錄產生一個 head.1 檔案。

如果你使用 man 開啟它，你就可以看到你的文件了。