|Title:||Automatic downloading of imports|
|Author:||Andrei Alexandrescu (andrei at erdani dot com), Steven Schveighoffer (schveiguy at yahoo dot com)|
- 1 Summary
- 2 Scope
- 3 Import Path Update / URL specification
- 4 Impact
- 5 Language feature
- 6 Trust model
- 7 Possible security by signed hashes
- 8 Using Package qualifier to specify package
- 9 Using Package qualifier to specify module
- 10 Transitivity and Conflict Resolution
- 11 Linking
- 12 __FILE__
- 13 External Import Tool
- 14 Caching
- 15 Packaging
- 16 Protocols
- 17 Alternatives
Currently there is no systematic way of specifying and loading of remote libraries. This proposal adds compiler features and a tool that allows the compiler to automatically download and install import files on an as-needed basis.
We hope that this feature will enable painless, simple building of programs with interdependent remote libraries. There would be no need for the user to specifically download files, expand archives, etc.
The proposed system does not impose versioning policy; instead, it allows code to enact it in URL structure.
Currently the system is intended for source libraries only. Future additions should address binary distributions.
Import Path Update / URL specification
Required DIP: DIP13 (import path binding)
The import path spec is changed to:
If the import path is specified as a url, dmd will attempt to get the file via the method specified in the url prefix (e.g. http://) The mechanism for accessing this path may be delegated to a helper tool (see below).
Urls conflict with the path list separator on POSIX ':'. Currently it is possible to use -I/path/foo:/path/bar. This should vanish on all platforms and can easily be replaced with -I/path/foo -I/path/bar.
A new pragma "importpath" is introduced:
This means, when compiling this file (and this file only), assume -I<import_path> was passed to the compiler before any other paths. If this file imports another file, the pragma does not apply for the imported file's imports.
There is an alternative proposal DIP14 to provide better language integration.
The user trusts the paths given on the command line (including URLs) or in config files, or as pragmas inside the source code.
dmd -Ihttp://a.com -Ihttp://b.org ...
means that the user allows downloading any D code from a.com and b.org during compilation.
No provisions are currently made for a user running the compiler on an already compromised machine (e.g. compiler, DNS server, system libraries).
Possible security by signed hashes
By providing hashes (e.g. SHA1, SHA256 or even multiple hashes of different kinds for redundancy) of the files that are signed by the lib developer with his GPG key, compromised files could easily be detected (as long as the lib developers private GPG is not on the compromised server or otherwise leaked).
However this implies that the user needs to either specify the developers GPG key (or something that identifies it) with the pragma for obtaining the lib or maintain a keyring of all trusted developers, else anyone can sign the code with his own GPG key.
The syntax for this has not been decided, or whether this will be included.
Using Package qualifier to specify package
With the include path specified as acme.widgets=http://acme.com/dlibs/widgets either via -I on the command line or via a pragma(importpath, ...) statement inside a source file, if subsequently this import is seen:
then, "http://acme.com/dlibs/widgets/square.di" and then "http://acme.com/dlibs/widgets/square.d" are attempted for download. If either of these fails, an error is generated (no other paths are tried).
Note that this applies for sub-packages too, i.e. all packages and modules having acme.widgets as a prefix will be looked up starting from URL "http://acme.com/dlibs/widgets" and adding slashes appropriately as dots are found in the imported alias. For example:
will search URL "http://acme.com/dlibs/widgets/enhanced/posix/" for files "circle.di" and then "circle.d". If neither is found, an error is generated.
Using Package qualifier to specify module
With the path specified above, if subsequently this import is seen:
then "http://acme.com/dlibs/widgets" is attempted for download (no d or .di is added). If this fails, an error is generated. Note that the download helper tool (see below) implements these features.
Transitivity and Conflict Resolution
A downloaded module may specify its own pragma(importpath, ...), which will be honored during compilation of that module (either during import or standalone).
If two import paths specify the same alias, they are in conflict unless the mapped URLs are identical. Example:
// module 1 pragma(importpath, "acme.widgets=http://acme.com/widgets/v1.0"); // module 2 pragma(importpath, "acme.widgets=http://acme.com/widgets/v2.3");
This is in order to avoid dependencies on two versions of the same library/file. Note that this only applies when compiling multiple modules in the same compiler invocation. If these files are compiled separately, and they do not import each other, the compiler cannot detect the conflict and will not error.
Although dmd does download remote modules, it will otherwise treat them like regular files, so it will issue linker errors unless the downloaded modules are also added to the build. However, in conjunction with an appropriate download protocol and pragma(lib, ...), the imported source can implement downloading of a compiled version of the file for linking. Using rdmd in conjunction to this feature will enable seamless builds with remote components.
In a downloaded module, __FILE__ will correspond to the url used to download the module.
External Import Tool
In order to facilitate downloading of modules in an extensible way without impacting the compiler source, the compiler may rely on an external tool to acquire the import. The tool path may be specified on the command line or in the configuration file. It's API will be as follows:
tool -Iurl [modulename]
Note that modulename will not include the package qualifier. For example, if the import path is specified as:
And a file imports acme.widgets.square, the following command will be executed:
tool -Ihttp://acme.com/widgets square
If the package qualifier encompasses the entire import (i.e. import acme.widgets), then the command line is:
tool -Ihttp://acme.com/widgets .
If the tool finds the file, it will output the following:
where url-resolved is the exact url used to download the module (to be used as __FILE__ per above) and return 0.
if the file cannot be found, it will output nothing, and return non-zero, which will cause the compiler to generate an error.
It is important to note that the compiler will try paths in order specified, which means a module could be in a local path, but an earlier path specification contains a remote url, the remote url will be tried before the local paths. That is, the additional abilities to specify remote paths does not affect how the compiler searches the import path.
To prevent re-fetching source files that likely are very static, the download tool can cache the files fetched using urls for future use. If any file changes remotely, the user will have to clear the cache, or if it can detect the changes easily, it can purge the cache without assistance.
The method of caching is not specified, nor is whether the tool should cache the absence of a file.
It is strongly advised that a specific package system be used for downloading both import source and compiled library for a given package. This allows one to easily specfiy pragma(lib) in the downloaded source to automatically link the file. The tool may have to edit the file used to link in the source file downloaded, as the linker is not going to be instrumented to know where to find the library.
This proposal only specifies that http be a possible protocol used by this tool. However, other protocols may be used as long as they can be specified in url form and the download tool can support them.
This arguably extralinguistic feature could be addressed by a tool included with the standard distribution. The tool would use traditional package and library management, along with installing additional files, running installation scripts, etc. Alternatively, a general OS-specific package management can be used.
This proposal does not aim at obviating all needs for such a tool, but instead it tries to (a) lower the barrier of entry for common code sharing scenarios, (b) allow self-contained, headache-free libraries, and (c) help and simplify construction of larger-scale tools.
It is worth noting that the compiler is the only tool that has exact knowledge and control over what modules are being imported. An external tool would need to maintain package description files in separation from the source code, or at best could run the compiler multiple times and incrementally download libraries as they are required. For such an approach refer to Adam Ruppe's message. His program is significantly slower than a compiler-integrated approach.
It's also possible to put the download tool code directly in the compiler, but this has several drawbacks:
- The compiler is written in C++, so this tool would also have to be.
- The compiler arguably should not be "infiltrated" by code that has nothing to do with compiling.
- There are already boatloads of programs/libraries that solve the problem of downloading files, we shouldn't be reinventing the wheel here.
- Adding a method for downloading does not require a rebuild of the compiler.
- The benefit of having the compiler actually do the downloading is that there would be no need to start an extra process.
- However, the performance impact should be quite minimal, especially with caching by the download tool.