mirror of
https://we.phorge.it/source/phorge.git
synced 2025-01-18 18:51:12 +01:00
Add documentation about the script and regex linter to the user guide.
Summary: The big, gigantic comment about the script and regex linter belongs in a more obvious place. I think this is a more obvious place. I also cleaned up a couple things. I'll update D9084 to remove the big comment block and point here instead. Test Plan: `bin/diviner generate --book src/docs/book/user.book` Reviewers: #blessed_reviewers, epriestley Reviewed By: #blessed_reviewers, epriestley Subscribers: epriestley, Korvin Differential Revision: https://secure.phabricator.com/D9100
This commit is contained in:
parent
cff721c657
commit
0ab192d245
2 changed files with 155 additions and 0 deletions
|
@ -413,5 +413,7 @@ Continue by:
|
|||
|
||||
- integrating and customizing built-in linters and lint bindings with
|
||||
@{article:Arcanist User Guide: Customizing Existing Linters}; or
|
||||
- use a linter that hasn't been integrated into Arcanist with
|
||||
@{article:Arcanist User Guide: Script and Regex Linter}; or
|
||||
- learning how to add new linters and lint engines with
|
||||
@{article:Arcanist User Guide: Customizing Lint, Unit Tests and Workflows}.
|
||||
|
|
153
src/docs/user/userguide/arcanist_lint_script_and_regex.diviner
Normal file
153
src/docs/user/userguide/arcanist_lint_script_and_regex.diviner
Normal file
|
@ -0,0 +1,153 @@
|
|||
@title Arcanist User Guide: Script and Regex Linter
|
||||
@group userguide
|
||||
|
||||
Explains how to use the Script and Regex linter to invoke an existing
|
||||
lint engine that is not integrated with Arcanist.
|
||||
|
||||
The Script and Regex linter is a simple glue linter which runs some
|
||||
script on each path, and then uses a regex to parse lint messages from
|
||||
the script's output. (This linter uses a script and a regex to
|
||||
interpret the results of some real linter, it does not itself lint
|
||||
both scripts and regexes).
|
||||
|
||||
Configure this linter by setting these keys in your configuration:
|
||||
|
||||
- `script-and-regex.script` Script command to run. This can be
|
||||
the path to a linter script, but may also include flags or use shell
|
||||
features (see below for examples).
|
||||
- `script-and-regex.regex` The regex to process output with. This
|
||||
regex uses named capturing groups (detailed below) to interpret output.
|
||||
|
||||
The script will be invoked from the project root, so you can specify a
|
||||
relative path like `scripts/lint.sh` or an absolute path like
|
||||
`/opt/lint/lint.sh`.
|
||||
|
||||
This linter is necessarily more limited in its capabilities than a normal
|
||||
linter which can perform custom processing, but may be somewhat simpler to
|
||||
configure.
|
||||
|
||||
== Script... ==
|
||||
|
||||
The script will be invoked once for each file that is to be linted, with
|
||||
the file passed as the first argument. The file may begin with a "-"; ensure
|
||||
your script will not interpret such files as flags (perhaps by ending your
|
||||
script configuration with "--", if its argument parser supports that).
|
||||
|
||||
Note that when run via `arc diff`, the list of files to be linted includes
|
||||
deleted files and files that were moved away by the change. The linter should
|
||||
not assume the path it is given exists, and it is not an error for the
|
||||
linter to be invoked with paths which are no longer there. (Every affected
|
||||
path is subject to lint because some linters may raise errors in other files
|
||||
when a file is removed, or raise an error about its removal.)
|
||||
|
||||
The script should emit lint messages to stdout, which will be parsed with
|
||||
the provided regex.
|
||||
|
||||
For example, you might use a configuration like this:
|
||||
|
||||
"script-and-regex.script": "/opt/lint/lint.sh --flag value --other-flag --"
|
||||
|
||||
stderr is ignored. If you have a script which writes messages to stderr,
|
||||
you can redirect stderr to stdout by using a configuration like this:
|
||||
|
||||
"script-and-regex.script": "sh -c '/opt/lint/lint.sh \"$0\" 2>&1'"
|
||||
|
||||
The return code of the script must be 0, or an exception will be raised
|
||||
reporting that the linter failed. If you have a script which exits nonzero
|
||||
under normal circumstances, you can force it to always exit 0 by using a
|
||||
configuration like this:
|
||||
|
||||
"script-and-regex.script": "sh -c '/opt/lint/lint.sh \"$0\" || true'"
|
||||
|
||||
Multiple instances of the script will be run in parallel if there are
|
||||
multiple files to be linted, so they should not use any unique resources.
|
||||
For instance, this configuration would not work properly, because several
|
||||
processes may attempt to write to the file at the same time:
|
||||
|
||||
COUNTEREXAMPLE
|
||||
"script-and-regex.script": "sh -c '/opt/lint/lint.sh --output /tmp/lint.out \"$0\" && cat /tmp/lint.out'"
|
||||
|
||||
There are necessary limits to how gracefully this linter can deal with
|
||||
edge cases, because it is just a script and a regex. If you need to do
|
||||
things that this linter can't handle, you can write a phutil linter and move
|
||||
the logic to handle those cases into PHP. PHP is a better general-purpose
|
||||
programming language than regular expressions are, if only by a small margin.
|
||||
|
||||
== ...and Regex ==
|
||||
|
||||
The regex must be a valid PHP PCRE regex, including delimiters and flags.
|
||||
|
||||
The regex will be matched against the entire output of the script, so it
|
||||
should generally be in this form if messages are one-per-line:
|
||||
|
||||
/^...$/m
|
||||
|
||||
The regex should capture these named patterns with `(?P<name>...)`:
|
||||
|
||||
- `message` (required) Text describing the lint message. For example,
|
||||
"This is a syntax error.".
|
||||
- `name` (optional) Text summarizing the lint message. For example,
|
||||
"Syntax Error".
|
||||
- `severity` (optional) The word "error", "warning", "autofix", "advice",
|
||||
or "disabled", in any combination of upper and lower case. Instead, you
|
||||
may match groups called `error`, `warning`, `advice`, `autofix`, or
|
||||
`disabled`. These allow you to match output formats like "E123" and
|
||||
"W123" to indicate errors and warnings, even though the word "error" is
|
||||
not present in the output. If no severity capturing group is present,
|
||||
messages are raised with "error" severity. If multiple severity capturing
|
||||
groups are present, messages are raised with the highest captured
|
||||
serverity. Capturing groups like `error` supersede the `severity`
|
||||
capturing group.
|
||||
- `error` (optional) Match some nonempty substring to indicate that this
|
||||
message has "error" severity.
|
||||
- `warning` (optional) Match some nonempty substring to indicate that this
|
||||
message has "warning" severity.
|
||||
- `advice` (optional) Match some nonempty substring to indicate that this
|
||||
message has "advice" severity.
|
||||
- `autofix` (optional) Match some nonempty substring to indicate that this
|
||||
message has "autofix" severity.
|
||||
- `disabled` (optional) Match some nonempty substring to indicate that this
|
||||
message has "disabled" severity.
|
||||
- `file` (optional) The name of the file to raise the lint message in. If
|
||||
not specified, defaults to the linted file. It is generally not necessary
|
||||
to capture this unless the linter can raise messages in files other than
|
||||
the one it is linting.
|
||||
- `line` (optional) The line number of the message.
|
||||
- `char` (optional) The character offset of the message.
|
||||
- `offset` (optional) The byte offset of the message. If captured, this
|
||||
supersedes `line` and `char`.
|
||||
- `original` (optional) The text the message affects.
|
||||
- `replacement` (optional) The text that the range captured by `original`
|
||||
should be automatically replaced by to resolve the message.
|
||||
- `code` (optional) A short error type identifier which can be used
|
||||
elsewhere to configure handling of specific types of messages. For
|
||||
example, "EXAMPLE1", "EXAMPLE2", etc., where each code identifies a
|
||||
class of message like "syntax error", "missing whitespace", etc. This
|
||||
allows configuration to later change the severity of all whitespace
|
||||
messages, for example.
|
||||
- `ignore` (optional) Match some nonempty substring to ignore the match.
|
||||
You can use this if your linter sometimes emits text like "No lint
|
||||
errors".
|
||||
- `stop` (optional) Match some nonempty substring to stop processing input.
|
||||
Remaining matches for this file will be discarded, but linting will
|
||||
continue with other linters and other files.
|
||||
- `halt` (optional) Match some nonempty substring to halt all linting of
|
||||
this file by any linter. Linting will continue with other files.
|
||||
- `throw` (optional) Match some nonempty substring to throw an error, which
|
||||
will stop `arc` completely. You can use this to fail abruptly if you
|
||||
encounter unexpected output. All processing will abort.
|
||||
|
||||
Numbered capturing groups are ignored.
|
||||
|
||||
For example, if your lint script's output looks like this:
|
||||
|
||||
error:13 Too many goats!
|
||||
warning:22 Not enough boats.
|
||||
|
||||
...you could use this regex to parse it:
|
||||
|
||||
/^(?P<severity>warning|error):(?P<line>\d+) (?P<message>.*)$/m
|
||||
|
||||
The simplest valid regex for line-oriented output is something like this:
|
||||
|
||||
/^(?P<message>.*)$/m
|
Loading…
Reference in a new issue