Comprehensive Review of Clash9 Framework by John Ousterhout
This review delves into the Clash9 framework designed by John Ousterhout, covering key aspects such as classes, main functions, ClashParser, Executor, SubstrParser, and more. The framework involves parsing commands, executing built-in commands, managing variables, configuring pipelines, and handling input/output effectively. It emphasizes robustness and flexibility for diverse tasks.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Review of clash9 John Ousterhout
Classes int main(int argc, char* argv[], char* envp[]) { vector<string> args, envs; for (int i = 0; i < argc; ++i) { args.push_back(string(argv[i])); } for (int i = 0; envp[i] != NULL; ++i) { envs.push_back(string(envp[i])); } Clash clash(args, envs); return clash.run_clash(); } Parse command line Read commands Execute builtin commands Manage variables, PATH ClashMain Clash Parse commands ClashParser Executor Fork/exec Configure pipeline I/O SubstrParser Escape/unescape Search/replace (quote- sensitive) CS 190 Lecture Notes: Clash Review (2020) Slide 2
Clash, ClashParser, Executor class Clash { Clash( vector<string>& argv, vector<string>& env); class Executor { struct Command {...}; static void execute( vector<Command>& pipedCommands, vector<string>& env, string& output, int& exitStatus); }; int run_clash(); }; class ClashParser { ClashParser(std::string script); vector<Executor::Command> get_next_pipelined_command( string clashPath, vector<string>& clashEnvironment, unordered_map<string, string>& variables); }; CS 190 Lecture Notes: Clash Review (2020) Slide 3
SubstrParser SubstrParser(char escapeSymbol); string substring_substitution( string inputString, vector<strLocation>& strLocation, unordered_map<string, string>& substitutionMapping); string escape_chars( string inputString, unordered_set<char>& charsToEscape); string unescape_chars( string inputString, unordered_set<char>& charsToUnescape); vector<SubstrParser::strLocation> find_substrs_between( string inputString, char substrSymbol); string escape_all_in_substrs( string inputString, char substrSymbol, bool escapeLast, unordered_set<char> skipArea); vector<unsigned> search_symbol( string inputString, char symbolToSearch); string remove_chars( string inputString, unordered_set<char>& charsToRemove); string unescape_all_in_substrs( string inputString, char substrSymbol, bool escapeLast, unordered_set<char> skipArea); vector<string> break_into_substrs( string inputString, unordered_set<char>& separators); CS 190 Lecture Notes: Clash Review (2020) Slide 4
SubstrParser Example echo abc\'def\' 'xyz' quoted 'text' escape_all_in_substrs(input, '\'', true, {'\ '}); echo abc\'def\' \x\y\z' quoted 'text' Interfaces are fairly general-purpose, but complex CS 190 Lecture Notes: Clash Review (2020) Slide 5
Example Usage parsedResult = parser.escape_all_in_substrs( parsedResult, '\'', false, unordered_set<char>({'"', '`'})); parsedResult = parser.escape_all_in_substrs(parsedResult, '"'); parsedResult = parser.escape_all_in_substrs(parsedResult, '` ); Escape vector<unsigned> singleLocs = parser.search_symbol(parsedResult, '\''); if (singleLocs.size() > 0) hasSingleQuote = true; Operate on escaped text unordered_set<char> singleQuote({'\''}); parsedResult = parser.remove_chars(parsedResult, singleQuote); parsedResult = parser.unescape_all_in_substrs(parsedResult, '`'); parsedResult = parser.unescape_all_in_substrs(parsedResult, ' ); Unescape Hard to keep track of what s in parsedResult CS 190 Lecture Notes: Clash Review (2020) Slide 6
Parser Alternatives Left-to-right scan of command: Traditional approach to parsing Hard to do in a single pass May have to remember what was quoted in previous passes Whole-command processing (e.g. raft9) Series of operations to perform various substitutions, etc. Remove all of the top-level single-quote characters, and backslash every character that was between them Tends to result in complex APIs, more code CS 190 Lecture Notes: Clash Review (2020) Slide 7
Main/Clash Interface Move to main program: Parsing command-line arguments Reading commands Might want to use shell with other command sources? Clash interface: Execute commands (optionally return stdout) Get/set variables CS 190 Lecture Notes: Clash Review (2020) Slide 8
Review of clash8 John Ousterhout
Overall Comments 2 passes Each pass scans left to right Parser code relatively easy to read, economical: clash_parser.cpp: 455 lines clash_parser.h: 165 lines (Only handles single-digit variable names) Uses regular expressions for variable name parsing: static const std::regex VAR_TERM_REGEX( "^(?:(\\$([a-zA-Z][a-zA-Z0-9]*))|(\\$([0-9*#?]))|(\\$\\{(.*?)\\}))"); ... std::regex_search(input.cbegin() + i, input.cend(), var_term_match, VAR_TERM_REGEX) Reduces code (a bit) But, patterns are hard to read CS 190 Lecture Notes: Clash Review (2020) Slide 10
Parser Passes myprog < $x.c a $y b | wc > wc.out; echo all done! break_into_pipelines myprog a $y b < $x.c wc > wc.out echo all done! substitute_and_word_break myprog a foo b wc echo all done! CS 190 Lecture Notes: Clash Review (2020) Slide 11
clash_parser API Both APIs exposed to higher-level clash_engine class: vector<pipeline> break_into_pipelines(string input_line); parsed_words_t substitute_and_word_break(string input); Requires extra code in clash_engine: pipelines = parser.break_into_pipelines(input_line); ... for (size_t i = 0; i < pipeline.size(); i++) { ... Clash_parser::parsed_words_t argv = parser.substitute_and_word_break( pipeline[i].command_string); (must also substitute redirect file names) Instead, have parser return fully-substituted pipelines? CS 190 Lecture Notes: Clash Review (2020) Slide 12
Informaton Leakage Knowledge of syntactic elements spread across clash_parser: 10 checks for 10 checks for ` 7 checks for \ 4 checks for 2 checks for $ CS 190 Lecture Notes: Clash Review (2020) Slide 13
Variable and Command Substitution Variable bindings object passed into parser For command substitution, parser creates new clash_engine Why not just use existing engine recursively? An interesting approach from another project: Main class engine provides callbacks to parser for command and variable substitution Allows the parser to be separated cleanly from the rest of clash Invoke "clash -c for command substitution? Unexported variables won t be visible CS 190 Lecture Notes: Clash Review (2020) Slide 14
Cant Scan Lines Backwards if (ends_with(line, "\\")) { throw Parse_exception { "Input cannot end in a backslash." }; } if (ends_with(line, "|") && !ends_with(line, "\\|")) { throw Parse_exception { "Last command cannot end in a pipe." }; } What if line ends in \\|? Or \\\|? Only safe way to parse line is left to right CS 190 Lecture Notes: Clash Review (2020) Slide 15
Overall Comments Lines of code: 1026, 1081, 1190, 1190, 1302, 1544, 1652, 1884, 2641 Common problems: Information leakage Many special cases (especially in parser) Temporal decomposition Opportunities for dividing up functionality: Process command line and read commands in main program Variable handling PATH handling Builtin commands CS 190 Lecture Notes: Clash Review (2020) Slide 17
My Parser Implementation Goals: Left-to-right parsing Allow multiple passes Keep track of characters that have been quoted! Encapsulate knowledge of various syntactic elements CS 190 Lecture Notes: Clash Review (2020) Slide 18
Token and TokenVector class TokenVector { public: TokenVector(); explicit TokenVector(const char *s); explicit TokenVector(const char *s, size_t size); explicit TokenVector(string *s); explicit TokenVector(Token *token, size_t size); Token &operator[](size_t index); void append(char c, uint8_t quoted = 0); void append(Token token); void append(Token *token, size_t size); void append(const char *s, size_t size); struct Token { char c; uint8_t quoted; Token(char c, uint8_t quoted = 0); Token(const Token& other); bool operator==(const char c); bool operator!=(const char c); static Token emptyChar(); bool isEmptyChar(); bool isWordBreak(); static void appendToString( const Token *token, string *s); static void appendToString( const Token *token, size_t size, string *s); }; void clear(); size_t size(); void quote(size_t first, size_t length); void replace(size_t first, size_t count, char *s); void truncate(size_t size); }; CS 190 Lecture Notes: Clash Review (2020) Slide 19
Parsing class Parser { public: static Token *parse(Token *input, int flags, Context *context, TokenVector *output); static const int END_ON_WORDS = 1; static const int END_ON_DOUBLE_QUOTE = 2; static const int END_ON_SINGLE_QUOTE = 4; static const int END_ON_BACK_QUOTE = 8; static const int SUBST_VARIABLES = 0x20; static const int COPY_VARIABLES = 0x40; static const int SUBST_COMMANDS = 0x80; static const int COPY_COMMANDS = 0x100; static const int SUBST_BACKSLASHES = 0x200; static const int COPY_BACKSLASHES = 0x400; static const int SUBST_QUOTE_BACKSLASHES = 0x800; static const int COPY_QUOTE_BACKSLASHES = 0x1000; static const int SUBST_DOUBLE_QUOTES = 0x2000; static const int COPY_DOUBLE_QUOTES = 0x4000; static const int SUBST_SINGLE_QUOTES = 0x8000; static const int COPY_SINGLE_QUOTES = 0x10000; static const int SKIP_LEADING_SEPARATORS = 0x20000; static const int COPY_WORD = COPY_VARIABLES | COPY_COMMANDS | COPY_BACKSLASHES | COPY_DOUBLE_QUOTES | COPY_SINGLE_QUOTES | END_ON_WORDS; CS 190 Lecture Notes: Clash Review (2020) Slide 20
Parsing, contd Token *Parser::parse(Token *input, int flags, Context *context, TokenVector *output) { for ( ; ; input++) { if (input->quoted) { output->append(*input); continue; } switch (input->c) { ... case '"': if (flags & SUBST_DOUBLE_QUOTES) { ... parse(input+1, SUBST_VARIABLES | SUBST_COMMANDS | SUBST_QUOTE_BACKSLASHES | END_ON_DOUBLE_QUOTE, context, output); ... } else if (flags & COPY_DOUBLE_QUOTES) { ... parse(input+1, COPY_VARIABLES | COPY_COMMANDS | COPY_QUOTE_BACKSLASHES | END_ON_DOUBLE_QUOTE, context, output); ... } else if (flags & END_ON_DOUBLE_QUOTE) { goto done; } CS 190 Lecture Notes: Clash Review (2020) Slide 21