identifying open source party software with scancode toolkit

21
Open Source for Open Source

Upload: nexb-inc

Post on 16-Apr-2017

1.620 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Identifying open source party software with ScanCode Toolkit

Open Source for Open Source

Page 2: Identifying open source party software with ScanCode Toolkit

Agenda

▷ Introduction to ScanCode Toolkit▷ Demo▷ ScanCode Details▷ About nexB

Page 3: Identifying open source party software with ScanCode Toolkit

Benefits of an open source scannerAs a developer:

▷ I get normalized and comprehensive license and origin data

▷ I can find the license immediately when I evaluate a library

▷ I can identify and resolve license issues before a release

▷ I can identify issues for each commit

▷ I can communicate clearly with legal and business about license

and origin of third-party code

You can use the Apache-licensed ScanCode Toolkit now!

Participate by contributing code, license rules, bugs, suggestions.

Page 4: Identifying open source party software with ScanCode Toolkit

What does ScanCode Toolkit do?It scans source and binary code to find:

▷ License notices, texts and “mentions”

▷ Copyright notices

▷ Package-level information (RPM, nuget, NPM, Jar, etc.)

▷ Other provenance clues (author, email, etc.)

▷ File-level information (type, name, checksums, etc.)

Page 5: Identifying open source party software with ScanCode Toolkit

ScanCode Results are provided as:

▷ JSON file

▷ Dynamic HTML

▷ Static HTML table usable

in a spreadsheet

Page 6: Identifying open source party software with ScanCode Toolkit

Place your screenshot here

Demo Time

Page 8: Identifying open source party software with ScanCode Toolkit

ScanCode Toolkit Licensing

License Notes

Software Apache 2.0With an acknowledgement in the scan output.

Reference Data

CC0 1.0 Public Domain

Third Party Components

L/GPL, MIT, BSD, Apache Various Licenses

Page 9: Identifying open source party software with ScanCode Toolkit

ScanCode Toolkit Roadmap

▷ nexB is migrating features from our proprietary scanning tools to ScanCode incrementally over the next year (2016)

▷ Roadmap at https://github.com/nexB/scancode-

toolkit/wiki/Roadmap

Page 10: Identifying open source party software with ScanCode Toolkit

Thanks!Any questions?

Page 11: Identifying open source party software with ScanCode Toolkit

CreditsSpecial thanks to all the people who made and released these awesome free resources:

▷ Presentation template by SlidesCarnival▷ Photographs by Unsplash▷ And all the software authors that made ScanCode possible

Page 12: Identifying open source party software with ScanCode Toolkit

Additional Details

▷ ScanCode by the numbers▷ What is scanning?▷ How does ScanCode work?▷ About nexB

Page 13: Identifying open source party software with ScanCode Toolkit

Over 6,000 tests

500+ large software products scanned

Over 3,000 licenses, notices and samples

ScanCode by the numbers

Page 14: Identifying open source party software with ScanCode Toolkit

ScanCode - Technology

▷ Written primarily in Python○ also JavaScript, Ruby, Java and C/C++

▷ Tested on Linux, OS X and Windows▷ Command line tool or library▷ Simple HTML browser-app (any modern

browser) - runs locally

Page 15: Identifying open source party software with ScanCode Toolkit

What is Scanning?

Detect and discover “evidence” of origin and license in code (source or binary files)

▷ Copyright notice▷ License notice and/or license test▷ Software package manifests▷ Email, URL, author or other names▷ Other origin and license clues found in the

code

Page 16: Identifying open source party software with ScanCode Toolkit

Scanning is not Matching

Matching looks for similarities between your code and an index (digital fingerprints) of OSS code

▷ If your code is similar it “may” share a similar origin

▷ Matching may be applied at multiple levels○ Package○ File or snippet

Page 17: Identifying open source party software with ScanCode Toolkit

Scanning plus Matching

▷ Scanning will identify origin and license in most cases, but○ Does not detect copying of snippets, or○ Intentional stripping of notices, etc.

▷ Matching can identify code that was copied and/or stripped, but

○ Typically produces MANY false

positives and requires extensive review

○ Especially for the most commonly used

OSS projects

Page 18: Identifying open source party software with ScanCode Toolkit

How does ScanCode work? (1)

▷ Each file is categorized based on its type▷ Archives and compressed files are fully extracted▷ The text of each file is collected (source and binaries)▷ Each file's text is then "scanned"▷ Results are formatted and returned as a JSON file▷ You can view the results in a browser, or▷ Use the JSON file as you want

Page 19: Identifying open source party software with ScanCode Toolkit

How does ScanCode work? (2)

▷ For licenses, the techniques are similar to DNA analysis with multi-pattern matching

▷ Licenses are found exactly or approximately based on a set of thousands of license texts, notices and examples

▷ For copyrights, a syntax and grammar analyzer captures the many forms of copyright statements

▷ Emails, URLs, authors, person names and other data are captured using similar pattern matching techniques

Page 20: Identifying open source party software with ScanCode Toolkit

Alternatives and complements

▷ Open source such as:○ Fossology (c, PHP): regex-based○ ninka (Perl): regex & sentences-based○ OSLC (Java, unmaintained)

▷ Commercial such as ...▷ Complementary:

○ AboutCode: document origin side-by-side with code, collect inventory, generate attribution doc

○ TraceCode (not yet released): trace the source to binary transformation to find (static) linking and what is the subset of the source code used (dynamically trace a build or does a static analysis)

Page 21: Identifying open source party software with ScanCode Toolkit

About nexB Inc.

We offer:

▷ DejaCode™- Open Data Platform for Managing Open Source - http://www.dejacode.com/

▷ Open Source Scanning & Attribution Generation Tools - https://github.com/nexB

▷ Open Source Software Expert Audit Services - http://www.nexb.com/services.html