Source Code Resolver
Resolve source code locations for packages identified by their Package URL (PURL).
| Status | Open |
| Type | Closed-Source Collaborative |
| Eligible Courses | SEMI, PRAK/PROJ |
| Scope | 5 ECTS - 10 ECTS |
| Use of Artificial Intelligence | Allowed |
| CLA Required? | Yes |
Summary
CRA Tool receives a list of Package URLs (PURLs) for a codebase. Metadata for these packages is resolved from package registries, which often provide incomplete, inaccurate, or no source code information at all (e.g. a repository URL without a specific revision, or a link that no longer exists). Access to the actual source code is crucial for downstream license, copyright, and open source health analysis.
The goal of this contribution is to build a source code resolver that reliably maps a PURL to a source code repository and, ideally, to the exact revision matching the package version.
This involves several steps. First, existing databases and services that map packages to source code locations should be evaluated (e.g. ClearlyDefined, Software Heritage, deps.dev). Second, heuristic strategies should be developed to discover source code locations from other metadata fields such as the homepage, bug tracker, or VCS URLs provided by the registry. Finally, once a repository is identified, the resolver should use Git or GitHub APIs to find tags that match the PURL version, pinpointing the concrete revision for analysis.
The implementation should be designed as a pipeline of resolution strategies, ordered by reliability, so that results from authoritative databases are preferred over heuristic approaches. Where possible, resolution results from previous versions of the same package should be reused, since the source repository typically remains the same across versions.
Expected Tasks
- Research and evaluate existing PURL-to-source-code databases
- Create detailed GitHub issues for this work
- Define an implementation timeline in the GitHub Project Roadmap
- Implement the solution and adapt your plan if necessary
- Write a brief project report (1-2 pages max)
Technologies
You should be familiar with the following programming languages:
- Java
You should be familiar with the following frameworks/libraries:
- Spring Boot
Knowing the following concepts will be helpful:
- Package URL (PURL) specification
- Package registries (npm, Maven Central, PyPI, etc.)