Skip to content

Commit f8aae9b

Browse files
authored
Add user-facing documentation around fingerprinting snippet-scans (#1576)
* Add user-facing documentation around fingerprinting snippet-scans, add a proper `analyze --help` message around `x-snippet-scan`: ``` --x-snippet-scan Experimental flag to enable snippet scanning to identify open source code snippets using fingerprinting. ``` * Provied ficus sensible default endpoint * Add default endpoint to pass to ficus
1 parent 1d7368f commit f8aae9b

4 files changed

Lines changed: 84 additions & 6 deletions

File tree

Changelog.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# FOSSA CLI Changelog
22

3+
## 3.11.1
4+
5+
- Better document `--x-snippet-scan`.
6+
37
## 3.11.0
48

59
- Add a dependency on Ficus, a new internal tool.

docs/references/subcommands/analyze.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,80 @@ In addition to the [standard flags](#specifying-fossa-project-details), the anal
142142
| `--strict` | Enforces strict analysis to ensure the most accurate results by rejecting fallbacks. When run with `--static-only-analysis`, the most optimal static strategy will be applied without fallbacks. |
143143

144144

145+
### Snippet Scanning
146+
147+
Snippet scanning identifies potential open source code snippets within your first-party source code by comparing file fingerprints against FOSSA's knowledge base. This feature helps detect code that may have been copied from open source projects.
148+
149+
#### Enabling Snippet Scanning
150+
151+
| Name | Description |
152+
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
153+
| `--x-snippet-scan` | Enable snippet scanning during analysis. This experimental feature fingerprints your source files and checks them against FOSSA's snippet database via ScanOSS integration. |
154+
155+
#### How Snippet Scanning Works
156+
157+
When `--x-snippet-scan` is enabled, the CLI:
158+
159+
1. **Hashes Files First**: Creates CRC64 hashes of all source files to identify which files need fingerprinting
160+
2. **Checks Necessity of Fingerprinting**: Checks with FOSSA servers to determine which file hashes are already known
161+
3. **Fingerprints New or Changed Files**: Uses the Ficus fingerprinting engine (written in Rust) to create cryptographic fingerprints only for files not previously seen
162+
4. **Filters Content**: By default, skips directories like `.git/`, and hidden directories. This includes, from `.fossa.yml`, `vendoredDependencies.licenseScanPathFilters.exclude`, documented further below.
163+
5. **Uploads Fingerprints**: Sends only the fingerprints to FOSSA's servers via ScanOSS integration
164+
6. **Receives Matches**: Gets back information about any matching open source components
165+
7. **Uploads Match Contents**: For files that have matches, uploads source code content temporarily to FOSSA servers.
166+
167+
#### Data Sent to FOSSA
168+
169+
**For Performance Optimization:**
170+
- CRC64 hashes of all files, to avoid re-fingerprinting unchanged files.
171+
172+
**For Fingerprinting:**
173+
- ScanOSS-compatible fingerprints of source code to identify matches.
174+
175+
**For Matched Files Only:**
176+
- The actual source code content of files that contain snippet matches.
177+
178+
#### Data Retention
179+
180+
- **File Fingerprints**: Stored permanently for caching and performance optimization
181+
- **Source Code Content**: Stored temporarily for 30 days and then automatically deleted
182+
- **CRC64 Hashes**: The likelihood of a collision with CRC64 (2^64 possible values) is extremely low.
183+
184+
#### Directory Filtering
185+
186+
By default, snippet scanning excludes common non-production directories and follows `.gitignore` patterns:
187+
188+
- Hidden directories.
189+
- Globs as directed by `.gitignore` files.
190+
191+
#### Custom Exclude Filtering
192+
193+
You can customize which files and directories are excluded from snippet scanning by configuring exclude filters in your `.fossa.yml` file. Note that snippet scanning currently only supports exclude patterns, not `only` patterns.
194+
195+
For example:
196+
```yaml
197+
version: 3
198+
vendoredDependencies:
199+
licenseScanPathFilters:
200+
exclude:
201+
- "**/test/**"
202+
- "**/tests/**"
203+
- "**/spec/**"
204+
- "**/node_modules/**"
205+
- "**/dist/**"
206+
- "**/build/**"
207+
- "**/*.test.js"
208+
- "**/*.spec.ts"
209+
```
210+
211+
**Important Notes:**
212+
213+
- Snippet scanning only uses the `exclude` filters from `licenseScanPathFilters` - `only` filters are ignored for this use-case.
214+
- Path filters use standard glob patterns (e.g., `**/*` for recursive matching, `*` for single-directory matching).
215+
- The configuration goes in the `vendoredDependencies.licenseScanPathFilters.exclude` section.
216+
- These exclude patterns are passed directly to the Ficus fingerprinting engine as `--exclude` arguments.
217+
- Default exclusions (hidden files, `.gitignore` patterns) are applied in addition to custom excludes.
218+
145219
### Experimental Options
146220

147221
_Important: For support and other general information, refer to the [experimental options overview](../experimental/README.md) before using experimental options._

src/App/Fossa/Config/Analyze.hs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,6 @@ import Options.Applicative (
114114
InfoMod,
115115
Parser,
116116
eitherReader,
117-
help,
118117
helpDoc,
119118
hidden,
120119
long,
@@ -345,7 +344,7 @@ cliParser =
345344
<*> flagOpt StaticOnlyTactics (applyFossaStyle <> long "static-only-analysis" <> stringToHelpDoc "Only analyze the project using static strategies.")
346345
<*> withoutDefaultFilterParser fossaAnalyzeDefaultFilterDocUrl
347346
<*> flagOpt StrictMode (applyFossaStyle <> long "strict" <> stringToHelpDoc "Enforces strict analysis to ensure the most accurate results by rejecting fallbacks.")
348-
<*> switch (applyFossaStyle <> long "x-snippet-scan" <> help "Enable ficus snippet scanning")
347+
<*> switch (applyFossaStyle <> long "x-snippet-scan" <> stringToHelpDoc "Experimental flag to enable snippet scanning to identify open source code snippets using fingerprinting.")
349348
where
350349
fossaDepsFileHelp :: Maybe (Doc AnsiStyle)
351350
fossaDepsFileHelp =

src/App/Fossa/Ficus/Analyze.hs

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,9 @@ import Fossa.API.Types (ApiKey (..), ApiOpts (..))
4444
import Path (Abs, Dir, File, Path, toFilePath)
4545
import Srclib.Types (Locator (..), renderLocator)
4646
import Text.URI (render)
47-
import Text.URI.Builder ()
47+
import Text.URI.Builder (PathComponent (PathComponent), TrailingSlash (TrailingSlash), setPath)
4848
import Types (GlobFilter (..), LicenseScanPathFilters (..))
49-
import Prelude hiding (unwords)
49+
import Prelude
5050

5151
newtype CustomLicensePath = CustomLicensePath {unCustomLicensePath :: Text}
5252
deriving (Eq, Ord, Show, Hashable)
@@ -171,8 +171,9 @@ ficusCommand :: Has Diagnostics sig m => FicusConfig -> BinaryPaths -> m Command
171171
ficusCommand ficusConfig bin = do
172172
endpoint <- case ficusConfigEndpoint ficusConfig of
173173
Just baseUri -> do
174-
pure $ render baseUri
175-
Nothing -> pure ""
174+
proxyUri <- setPath [PathComponent "api", PathComponent "proxy", PathComponent "analysis"] (TrailingSlash False) baseUri
175+
pure $ render proxyUri
176+
Nothing -> pure "https://app.fossa.com/api/proxy/analysis"
176177
pure $
177178
Command
178179
{ cmdName = toText $ toPath bin

0 commit comments

Comments
 (0)