agate

Simple gemini server for static files
git clone https://github.com/mbrubeck/agate.git
Log | Files | Refs | README

commit fdca5305910b16b9874aaf267d0b03e6394489a0
parent 49813d0c68137b24b70380068cc826742ef3b920
Author: Johann150 <johann.galle@protonmail.com>
Date:   Fri, 12 Feb 2021 14:50:27 +0100

allow globs in config file paths

The configuration parser will have to be changed again because YAML does not
support asterisks in its key names.

Diffstat:
MCHANGELOG.md | 1+
MCargo.lock | 7+++++++
MCargo.toml | 1+
MREADME.md | 31++++++++++++++++++++++++-------
Msrc/metadata.rs | 68+++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
5 files changed, 86 insertions(+), 22 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md @@ -14,6 +14,7 @@ Thank you to @gegeweb for contributing to this release. * Disabling support for TLSv1.2 can now be done using the `--only-tls13` flag, but this is *NOT RECOMMENDED* (#12). * The tools now also contain a startup script for FreeBSD (#13). * Using central config mode (flag `-C`), all configuration can be done in one `.meta` file (see README.md for details). +* The `.meta` configuration file now allows for globs to be used. ### Changed * The configuration files are now parsed as YAML. The syntax only changes in that a space is now required behind the colon. diff --git a/Cargo.lock b/Cargo.lock @@ -6,6 +6,7 @@ version = "2.4.1" dependencies = [ "env_logger", "getopts", + "glob", "log", "mime_guess", "once_cell", @@ -96,6 +97,12 @@ dependencies = [ ] [[package]] +name = "glob" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574" + +[[package]] name = "hermit-abi" version = "0.1.18" source = "registry+https://github.com/rust-lang/crates.io-index" diff --git a/Cargo.toml b/Cargo.toml @@ -23,6 +23,7 @@ percent-encoding = "2.1" rustls = "0.19.0" url = "2.2" yaml-rust = "0.4" +glob = "0.3" [profile.release] lto = true diff --git a/README.md b/README.md @@ -65,9 +65,20 @@ A file called `index.gmi` will always take precedence over a directory listing. ### Meta-Presets -You can put a file called `.meta` in a directory. This file stores some metadata about these files which Agate will use when serving these files. The file should be UTF-8 encoded. Like the `.directory-listing-ok` file, this file does not have an effect on sub-directories. (*1) -This file is parsed as a YAML file and should contain a "hash" datatype with file names as the keys. This means: -Lines starting with a `#` are comments and will be ignored like empty lines. All other lines must start with a file name, followed by a colon and a space and then the metadata. +You can put a file called `.meta` in any content directory. This file stores some metadata about the adjacent files which Agate will use when serving these files. The `.meta` file must be UTF-8 encoded. +You can also enable a central configuration file with the `-C` flag (or the long version `--central-conf`). In this case Agate will always look for the `.meta` configuration file in the content root directory and will ignore `.meta` files in other directories. + +The `.meta` file is parsed as a YAML file and should contain a "hash" datatype with file names as the keys. This means: +* Lines starting with a `#` are comments and will be ignored, as will empty lines. +* All other lines must have the form `<path>: <metadata`, i.e. start with a file path, followed by a colon and a space and then the metadata. + +`<path>` is a case sensitive file path, which may or may not exist on disk. If <path> leads to a directory, it is ignored. +If central configuration file mode is not used, using a path that is not a file in the current directory is undefined behaviour (for example: `../index.gmi` would be undefined behaviour). +You can use Unix style patterns in existing paths. For example `content/*` will match any file within `content`, and `content/**` will additionally match any files in subdirectories of `content`. +However, the `*` and `**` globs on their own will by default not match files or directories that start with a dot because of their special meaning (see Directory listing). +This behaviour can be disabled with `--serve-secret` or by explicitly matching files starting with a dot with e.g. `content/.*` or `content/**/.*` respectively. +For more information on the patterns you can use, please see the [documentation of `glob::Pattern`](https://https://docs.rs/glob/0.3.0/glob/struct.Pattern.html). +Rules can overwrite other rules, so if a file is matched by multiple rules, the last one applies. The metadata can take one of four possible forms: 1. empty @@ -85,14 +96,21 @@ If a line violates the format or looks like case 3, but is incorrect, it might b Such a configuration file might look like this: ```text # This line will be ignored. +**.de.gmi: ;lang=de +nl/**.gmi: ;lang=nl index.gmi: ;lang=en-UK LICENSE: text/plain;charset=UTF-8 gone.gmi: 52 This file is no longer here, sorry. ``` -You can enable a central configuration file with the `-C` flag (or the long version `--central-conf`). In this case Agate will always look for the `.meta` configuration file in the content root directory and will ignore `.meta` files in other directories. - -(*1) It is *theoretically* possible to specify information on files which are in sub-directories. The problem would only be to make sure that this file is loaded before the respective path/file is requested. This is because Agate does not actively check that the "no sub-directories" regulation is met. In fact this might be dropped in a change of configuration format in the foreseeable future. +If this is the `.meta` file in the content root directory and the `-C' flag is used, this will result in the following: +requested filename|response header +---|--- +`/ ` or `/index.gmi`|`20 text/gemini;lang=en-UK` +`/LICENSE`|`20 text/plain;charset=UTF-8` +`/gone.gmi`|`52 This file is no longer here, sorry.` +any non-hidden file ending in `.de.gmi` (including in non-hidden subdirectories)|`20 text/gemini;lang=de` +any non-hidden file in the `nl` directory ending in `.gmi` (including in non-hidden subdirectories)|`20 text/gemini;lang=nl` ### Logging Verbosity @@ -109,7 +127,6 @@ If you want to serve the same content for multiple domains, you can instead disa [Gemini]: https://gemini.circumlunar.space/ [Rust]: https://www.rust-lang.org/ [home]: gemini://gem.limpet.net/agate/ -[rustup]: https://www.rust-lang.org/tools/install [source]: https://github.com/mbrubeck/agate [crates.io]: https://crates.io/crates/agate [documentation of `env_logger`]: https://docs.rs/env_logger/0.8 diff --git a/src/metadata.rs b/src/metadata.rs @@ -1,3 +1,4 @@ +use glob::{glob_with, MatchOptions}; use std::collections::BTreeMap; use std::path::{Path, PathBuf}; use std::time::SystemTime; @@ -69,20 +70,20 @@ impl FileOptions { /// Checks wether the database for the directory of the specified file is /// still up to date and re-reads it if outdated or not yet read. - fn update(&mut self, dir: &Path) { - let mut dir = if super::ARGS.central_config { + fn update(&mut self, file: &Path) { + let mut db = if super::ARGS.central_config { super::ARGS.content_dir.clone() } else { - dir.parent().expect("no parent directory").to_path_buf() + file.parent().expect("no parent directory").to_path_buf() }; - dir.push(SIDECAR_FILENAME); + db.push(SIDECAR_FILENAME); - let should_read = if let Ok(metadata) = dir.as_path().metadata() { + let should_read = if let Ok(metadata) = db.as_path().metadata() { if !metadata.is_file() { // it exists, but it is a directory false } else if let (Ok(modified), Some(last_read)) = - (metadata.modified(), self.databases_read.get(&dir)) + (metadata.modified(), self.databases_read.get(&db)) { // check that it was last modified before the read // if the times are the same, we might have read the old file @@ -99,7 +100,7 @@ impl FileOptions { }; if should_read { - self.read_database(&dir); + self.read_database(&db); } } @@ -109,6 +110,9 @@ impl FileOptions { log::trace!("reading database {:?}", db); if let Ok(contents) = std::fs::read_to_string(db) { + self.databases_read + .insert(db.to_path_buf(), SystemTime::now()); + let docs = match YamlLoader::load_from_str(&contents) { Ok(docs) => docs, Err(e) => { @@ -136,7 +140,7 @@ impl FileOptions { continue; }; - // generate workspace-unique path + // generate workspace-relative path let mut path = db.clone(); path.pop(); path.push(rel_path); @@ -175,16 +179,50 @@ impl FileOptions { PresetMeta::FullMime(header.to_string()) }; - self.file_meta.insert(path, preset); + let glob_options = MatchOptions { + case_sensitive: true, + // so there is a difference between "*" and "**". + require_literal_separator: true, + // security measure because entries for .hidden files + // would result in them being exposed. + require_literal_leading_dot: !crate::ARGS.serve_secret, + }; + + // process filename as glob + let paths = if let Some(path) = path.to_str() { + match glob_with(path, glob_options) { + Ok(paths) => paths.collect::<Vec<_>>(), + Err(err) => { + log::error!("incorrect glob pattern: {}", err); + continue; + } + } + } else { + log::error!("path is not UTF-8: {:?}", path); + continue; + }; + + if paths.is_empty() { + // probably an entry for a nonexistent file, glob only works for existing files + self.file_meta.insert(path, preset); + } else { + for glob_result in paths { + match glob_result { + Ok(path) if path.is_dir() => { /* ignore */ } + Ok(path) => { + self.file_meta.insert(path, preset.clone()); + } + Err(err) => { + log::warn!("could not process glob path: {}", err); + continue; + } + }; + } + } } } else { log::error!("no YAML document {:?}", db); - return; - }; - self.databases_read.insert( - db.as_path().parent().unwrap().to_path_buf(), - SystemTime::now(), - ); + } } else { log::error!("could not read configuration file {:?}", db); }