Review/std.d.lexer

From D Wiki
Revision as of 14:44, 2 October 2013 by Dicebot (talk | contribs) (Current state)
Jump to: navigation, search

Description

std.d.lexer is standard module for lexing D code, written by Brian Schott

Related links

  1. Official lexing specification
  2. Documentation
  3. Example project that uses std.d.lexer

Current state

Under voting http://forum.dlang.org/post/eeenynxifropasqcufdg@forum.dlang.org

Review 1

Result

There were no critical concerns raised (by review manager opinion) but Brian has decided to address some of smaller ones before putting it on vote.

Description

Probably longest part of discussion was about requiring lookahead in lexer and related complexity/performance implications. Outcome of discussion wasn't really clear.

Other comments/proposals:

  1. using _ suffix for keywords (listed separately as raised some discussion)
  2. various other naming issues
  3. general lack of documentation
  4. TokenType definition (performance concerns)
  5. providing some benchmark for comparing with other lexerout of the box
  6. splitting it and converting into package
  7. tests need more coverage / comments
  8. sharing identifier pool between multiple lexers


Changes so far

https://github.com/Hackerpilot/phobos/commit/9bdb7f97bb8021f3b0d0291896b8fe21a6fead23#std/d/lexer.d :

  • There are a few more unit tests now
  • bitAnd renamed to amp
  • slice rename to dotdot
  • Much more cross-referencing in the doc comments
  • Start line and column can be specified in the lexer config

https://github.com/Hackerpilot/phobos/compare/D-Programming-Language:df38839...master

Clarifications about concerns

   Renaming tokens:
       I feel that this is consistent with Phobos' style guidelines and does not
       need to be changed. Members of the D community are free to vote no on
       inclusion of this module if they disagree strongly enough.
   
   Defining tokens in terms of templates:
       I'm not sure how this would impact performance of the lexer. Given the
       amount of effort that went in to optimizing the lexer on the part of Dmitry,
       myself, and others, I'm not willing to try changing something like this.
       If someone else is strongly in favor of the Tok!"=" style, they can
       implement it and benchmark it against the current implementation.
   
   Splitting the lexer module into smaller sub-modules:
       I don't feel that this is worth the complexity that would be added.
   
   Using slices of a giant string constant to speed things up:
       This didn't actually speed things up.
   
   Changing the lexer to throw a ParseException instead of Exception:
       This can be revisited when Phobos has a standard exception heirarchy.
   
   Creating a parser generator:
       This is not in the scope of a lexer module review. 

(c) Brian