Google Summer of Code 2024 Final Report

Name: Meet Soni (@inosmeet)
Organization: Python Software Foundation
Sub-organization: cve-bin-tool
Project: Product Mapping using PURLs
Proposal: View/Download

Summary

Project Overview

Over the course of GSoC 2024, my project aimed to enhance the cve-bin-tool by improving its product mapping capabilities to reduce false positives. Initially, the tool relied on explicit, pre-defined mappings between binary signatures and a list of CPE identifiers. However, this approach had significant limitations. As vulnerabilities evolved and new components emerged, maintaining these mappings required frequent updates, and the tool struggled with arbitrary product names, such as those found in Rust’s Cargo.lock files.

To address these issues, my project focused on implementing a more robust and flexible mapping system that integrates seamlessly with the tool’s existing structure. This new system leverages the purl2cpe database to handle varied and previously unsupported product names more effectively. By utilizing the purl2cpe database, the system reduces the need for constant updates and improves the accuracy of vulnerability detection.

What is purl2cpe ?

The purl2cpe is a database that contains relations between CPEs (Common Product Enumerator) and PURLs (Package URL). PURL is an open specification that standardizes identification and location of software packages/versions in their respective repositories.

While CPEs provide a precise identification for components and versions, they do not provide an easy way to connect these vulnerable component versions with their respective Open Source repositories. These connections must be made available by human curation.

purl2cpe makes it easy for anyone to monitor the packages they use for known vulnerabilities.

Why We Needed the Mismatch Database

To further reduce false positives, especially in cases where purl2cpe does not find a match, we developed the mismatch database. Previously, when no match was found, the tool would revert to searching product names directly, which led to incorrect associations with vulnerabilities. The mismatch database serves as a source of “anti-matching” information, ensuring that similarly named but unrelated products do not cause false positives.

Potential Uses Beyond cve-bin-tool

While initially intended for cve-bin-tool, this database has the potential to be useful outside of the project as well. By making it available, we hope to support efforts in the broader community to de-duplicate similarly named software components and enhance the accuracy of vulnerability management tools.

Resolving Key Issues

Through this project, I was able to resolve key issues identified in the community (issue #3152, issue #3179), contributing to a more reliable and scalable solution for vulnerability management.

Tasks Achieved

Integrate purl2cpe database:
PRs:
Mismatch database:
PRs:
Convert Mismatch database into a standalone entity:
PRs:

Future Work

Expanding the Mismatch Database

The current implementation of the mismatch database focuses primarily on vendors. This approach is useful but faces challenges when dealing with unknown vendors. As highlighted in issue #3193, unknown vendors can lead to gaps in data and unresolved mismatches.

To address these limitations, there is potential to expand the database to include additional attributes such as invalid purls and more. By broadening the scope, we can enhance the database’s ability to handle complex cases, improve the accuracy of vulnerability detection, and reduce the likelihood of false positives.

What stood out to me:

What I really enjoyed during this period was having the freedom to make decisions and try things my way. It felt great to have that level of ownership. Working alongside industry experts was a huge bonus— I got to learn from their experience while still having the space to explore and grow on my own. Overall, it was a super empowering experience.

Looking back, If I had a chance to do it all over again, I’d just sprinkle in a bit more structure to balance the freedom with some checkpoints—because even the best adventures can benefit from a few guiding stars.

I want to extend my heartfelt thanks to Google for providing this incredible opportunity through the GSoC program. It’s been an amazing journey of learning and growth. A big thank you to the Python Software Foundation for fostering such a vibrant community.

I’m especially grateful to my mentors, Terri Oda, Anthony Harrison and Ben Lewis. Your guidance, patience, and encouragement made all the difference. Thank you for believing in me and helping me navigate this project with confidence. This experience wouldn’t have been the same without your support.

Lastly, a shoutout to my fellow GSoC contributor, Sanskar. Collaborating with you made this journey even more rewarding. I’m glad we could support each other along the way.

GSoC: Final Report