Browse Source

Archive support (#1872)

* Start Files Source

* Keep cmd the same for now

* Get tests to pass

* s.MaxTargetMegaBytes -> s.MaxFileSize

* Tweak some log messages & use q semgroup for files

* Just call yield inside the semgroup

* Refactor a check and revert some log lines

* Move mg && misc tuning

* Add `make profile`

(Benchmark results are only added if tools are installed)

Example output:

```
========================================================================
generating profile data
------------------------------------------------------------------------
- mode: dir
  benchmark:
    tool: hyperfine
    path: profile/1748019612/dir/benchmark.json
    results:
      mean: 1.12852930166
      stddev: 0.12163775966037399
      median: 1.07242928416
      user: 3.9699382799999996
      system: 0.04382341999999999
      min: 1.03207673016
      max: 1.3913475181600001
  profile:
    - mode: cpu
      path: profile/1748019612/dir/cpu.pprof
      view: go tool pprof -http=localhost: ./gitleaks profile/1748019612/dir/cpu.pprof
    - mode: mem
      path: profile/1748019612/dir/mem.pprof
      view: go tool pprof -http=localhost: ./gitleaks profile/1748019612/dir/mem.pprof
    - mode: trace
      path: profile/1748019612/dir/trace.out
      view: go tool trace profile/1748019612/dir/trace.out
- mode: git
  ...snip...
```

* Archive support for File based sources

* Move Git to Source interface

* Scan archives in git

* Misc linter fixes

* Exclude certain diffs: --diff-filter=tuxdb

* Add git archives tests

* Log error instead of fatal

* Fix spacing in test config

* Refine source comments/wording

* Make MB conversions more readable

* Format cmd

* Adjust DetectSource logging

* MiB -> MB in logs

* Add testdata/repos/archives/dotGit/refs/.gitkeep

Without it the refs folder gets deleted and the
and then it isn't detected as a git repo ¯\_(ツ)_/¯

So then the test starts scanning the whole gitleaks
repo instead :D

* Handle finding links for archives

* Working on archive depth

* Fixes from failfast test

* cmd max-archive-depth

* added comments, fixed warnings

* removed test print

* Update README.md

* Set archive depth for DetectFiles

* Tweak logs & pass max archive for readers

* Pass ctx to Fragments

* Add make lint target and address some issues

* Fix compat issue

* Report path in fragment errors

* Apply PR suggestions & misc changes

- Update blobReader.Close() to discard the buffer
- Misc logger issues & uses
- Tweak .golangci.yaml to default to none
- Discard remaining blobReader data on close
- Undo a De Morgan's Law suggestion (and disable QF1001)

* Add http mode for diagnostics

gitleaks --diagnostics=http ...
-----

From the net/http/pprof docs:

Use the pprof tool to look at the heap profile:
```
go tool pprof http://localhost:6060/debug/pprof/heap
```

Or to look at a 30-second CPU profile:
```
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
```

Or to look at the goroutine blocking profile, after calling runtime.SetBlockProfileRate in your program:
```
go tool pprof http://localhost:6060/debug/pprof/block
```

Or to look at the holders of contended mutexes, after calling runtime.SetMutexProfileFraction in your program:
```
go tool pprof http://localhost:6060/debug/pprof/mutex
```

(For more info see https://pkg.go.dev/net/http/pprof)

---------

Co-authored-by: Alex Layne <alayne@redhat.com>
bplaxco 8 tháng trước cách đây
mục cha
commit
782f310478
77 tập tin đã thay đổi với 2965 bổ sung767 xóa
  1. 1 0
      .gitignore
  2. 86 0
      .golangci.yaml
  3. 11 1
      Makefile
  4. 48 5
      README.md
  5. 40 21
      cmd/detect.go
  6. 30 0
      cmd/diagnostics.go
  7. 14 18
      cmd/directory.go
  8. 4 2
      cmd/generate/config/base/config.go
  9. 1 2
      cmd/generate/config/main.go
  10. 1 1
      cmd/generate/config/rules/generic.go
  11. 2 4
      cmd/generate/config/rules/plaid.go
  12. 16 8
      cmd/git.go
  13. 14 9
      cmd/root.go
  14. 12 3
      cmd/stdin.go
  15. 2 2
      config/allowlist.go
  16. 2 6
      config/config.go
  17. 11 11
      config/config_test.go
  18. 2 1
      config/gitleaks.toml
  19. 3 2
      config/rule.go
  20. 1 1
      detect/codec/hex.go
  21. 1 1
      detect/codec/percent.go
  22. 1 1
      detect/codec/start_end.go
  23. 104 38
      detect/detect.go
  24. 930 2
      detect/detect_test.go
  25. 0 203
      detect/directory.go
  26. 92 0
      detect/files.go
  27. 20 177
      detect/git.go
  28. 8 8
      detect/location.go
  29. 1 1
      detect/location_test.go
  30. 41 60
      detect/reader.go
  31. 23 42
      detect/utils.go
  32. 42 23
      detect/utils_test.go
  33. 19 0
      go.mod
  34. 286 0
      go.sum
  35. 2 2
      report/finding.go
  36. 2 1
      report/template.go
  37. 80 0
      scripts/profile.sh
  38. 125 0
      sources/common.go
  39. 3 3
      sources/common_test.go
  40. 0 105
      sources/directory.go
  41. 248 0
      sources/file.go
  42. 180 0
      sources/files.go
  43. 26 0
      sources/fragment.go
  44. 301 2
      sources/git.go
  45. 16 0
      sources/source.go
  46. BIN
      testdata/archives/files.7z
  47. BIN
      testdata/archives/files.tar
  48. BIN
      testdata/archives/files.tar.xz
  49. BIN
      testdata/archives/files.tar.zst
  50. BIN
      testdata/archives/files.zip
  51. 6 0
      testdata/archives/files/.env.prod
  52. 1 0
      testdata/archives/files/.gitleaksignore
  53. 24 0
      testdata/archives/files/api.go
  54. 24 0
      testdata/archives/files/main.go
  55. BIN
      testdata/archives/files/main.go.gz
  56. BIN
      testdata/archives/files/main.go.xz
  57. BIN
      testdata/archives/files/main.go.zst
  58. BIN
      testdata/archives/nested.tar.gz
  59. 21 0
      testdata/config/archives.toml
  60. 0 1
      testdata/config/simple.toml
  61. 0 0
      testdata/repos/archives/.gitleaksignore
  62. 10 0
      testdata/repos/archives/README.md
  63. 1 0
      testdata/repos/archives/dotGit/HEAD
  64. 1 0
      testdata/repos/archives/dotGit/ORIG_HEAD
  65. 13 0
      testdata/repos/archives/dotGit/config
  66. 1 0
      testdata/repos/archives/dotGit/description
  67. BIN
      testdata/repos/archives/dotGit/index
  68. 6 0
      testdata/repos/archives/dotGit/info/exclude
  69. 1 0
      testdata/repos/archives/dotGit/info/refs
  70. BIN
      testdata/repos/archives/dotGit/objects/info/commit-graph
  71. 2 0
      testdata/repos/archives/dotGit/objects/info/packs
  72. BIN
      testdata/repos/archives/dotGit/objects/pack/pack-9d774732f0e985d717a26e126e6574d089375b0d.idx
  73. BIN
      testdata/repos/archives/dotGit/objects/pack/pack-9d774732f0e985d717a26e126e6574d089375b0d.pack
  74. BIN
      testdata/repos/archives/dotGit/objects/pack/pack-9d774732f0e985d717a26e126e6574d089375b0d.rev
  75. 2 0
      testdata/repos/archives/dotGit/packed-refs
  76. 0 0
      testdata/repos/archives/dotGit/refs/.gitkeep
  77. BIN
      testdata/repos/archives/main.go.zst

+ 1 - 0
.gitignore

@@ -9,6 +9,7 @@
 *.got
 gitleaks
 build
+profile
 
 # configs
 .gitleaks.toml

+ 86 - 0
.golangci.yaml

@@ -0,0 +1,86 @@
+version: '2'
+linters:
+  default: none
+  # It might be worth going through some of the disabled linters and enabling
+  # them and fixing the items they call out e.g.: cyclop, prealloc,
+  # paralleltest, prealloc, errcheck, dupl, unused, testifylint, gosec,
+  # gocritic, perfsprint, exptostd, intrange, perfsprint (maybe others?)
+  disable:
+    - cyclop
+    - depguard
+    - dupl
+    - err113
+    - errcheck
+    - exhaustive
+    - exhaustruct
+    - exptostd
+    - forbidigo
+    - funcorder
+    - funlen
+    - gochecknoglobals
+    - gochecknoinits
+    - gocognit
+    - goconst
+    - gocritic
+    - gocyclo
+    - godot
+    - godox
+    - gosec
+    - gosmopolitan
+    - intrange
+    - lll
+    - maintidx
+    - mnd
+    - musttag
+    - nestif
+    - nilerr
+    - nlreturn
+    - nonamedreturns
+    - paralleltest
+    - perfsprint
+    - prealloc
+    - predeclared
+    - tagliatelle
+    - testifylint
+    - testpackage
+    - tparallel
+    - unparam
+    - unused
+    - varnamelen
+    - wastedassign
+    - whitespace
+    - wrapcheck
+    - wsl
+    - zerologlint  # doesn't seem to catch gitleaks/v8/logging mistakes
+  enable:
+    - inamedparam
+    - misspell
+    - revive
+    - misspell
+    - inamedparam
+    - exhaustruct
+    - inamedparam
+    - misspell
+    - nonamedreturns
+    - staticcheck
+    - unconvert
+  exclusions:
+    rules:
+      - linters:
+          - staticcheck
+        source: 'detector\.Detect\w+\(|sources\.DirectoryTargets\(|detect\.(?:Fragment|RemoteInfo)'
+      - linters:
+          - misspell
+        source: '"(?:addres|busines|clas)",'
+  settings:
+    staticcheck:
+      checks:
+        - all
+        - '-QF1001'
+        - '-ST1000'
+        - '-ST1003'
+        - '-ST1018'
+        - '-ST1020'
+        - '-ST1021'
+    revive:
+      severity: error

+ 11 - 1
Makefile

@@ -1,4 +1,4 @@
-.PHONY: test test-cover
+.PHONY: test test-cover failfast profile clean format build
 
 PKG=github.com/zricethezav/gitleaks
 VERSION := `git fetch --tags && git tag | sort -V | tail -1`
@@ -15,13 +15,23 @@ format:
 test: config/gitleaks.toml format
 	go test -v ./... --race $(PKG)
 
+failfast: format
+	go test -failfast ./...
+
 build: config/gitleaks.toml format
 	go mod tidy
 	go build $(LDFLAGS)
 
+lint:
+	golangci-lint run
+
 clean:
+	rm -rf profile
 	find . -type f -name '*.got.*' -delete
 	find . -type f -name '*.out' -delete
 
+profile: build
+	./scripts/profile.sh './gitleaks' '.'
+
 config/gitleaks.toml: $(wildcard cmd/generate/config/**/*)
 	go generate ./...

+ 48 - 5
README.md

@@ -24,7 +24,7 @@
 [gitleaks-playground]: https://gitleaks.io/playground
 
 
-[![Github Action Test][badge-build]][build]
+[![GitHub Action Test][badge-build]][build]
 [![Docker Hub][dockerhub-badge]][dockerhub]
 [![Gitleaks Playground][gitleaks-playground-badge]][gitleaks-playground]
 [![Gitleaks Action][gitleaks-badge]][gitleaks-action]
@@ -119,7 +119,7 @@ jobs:
          - id: gitleaks
    ```
 
-   for a [native execution of GitLeaks](https://github.com/gitleaks/gitleaks/releases) or use the [`gitleaks-docker` pre-commit ID](https://github.com/gitleaks/gitleaks/blob/master/.pre-commit-hooks.yaml) for executing GitLeaks using the [official Docker images](#docker)
+   for a [native execution of gitleaks](https://github.com/gitleaks/gitleaks/releases) or use the [`gitleaks-docker` pre-commit ID](https://github.com/gitleaks/gitleaks/blob/master/.pre-commit-hooks.yaml) for executing gitleaks using the [official Docker images](#docker)
 
 3. Auto-update the config to the latest repos' versions by executing `pre-commit autoupdate`
 4. Install with `pre-commit install`
@@ -169,6 +169,7 @@ Flags:
       --ignore-gitleaks-allow         ignore gitleaks:allow comments
   -l, --log-level string              log level (trace, debug, info, warn, error, fatal) (default "info")
       --max-decode-depth int          allow recursive decoding up to this depth (default "0", no decoding is done)
+      --max-archive-depth int         allow scanning into nested archives up to this depth (default "0", no archive traversal is done)
       --max-target-megabytes int      files larger than this will be skipped
       --no-banner                     suppress banner
       --no-color                      turn off color for verbose output
@@ -191,6 +192,7 @@ If you find v8.19.0 broke an existing command (`detect`/`protect`), please open
 There are three scanning modes: `git`, `dir`, and `stdin`.
 
 #### Git
+
 The `git` command lets you scan local git repos. Under the hood, gitleaks uses the `git log -p` command to scan patches.
 You can configure the behavior of `git log -p` with the `log-opts` option.
 For example, if you wanted to run gitleaks on a range of commits you could use the following
@@ -198,10 +200,12 @@ command: `gitleaks git -v --log-opts="--all commitA..commitB" path_to_repo`. See
 If there is no target specified as a positional argument, then gitleaks will attempt to scan the current working directory as a git repo.
 
 #### Dir
+
 The `dir` (aliases include `files`, `directory`) command lets you scan directories and files. Example: `gitleaks dir -v path_to_directory_or_file`.
 If there is no target specified as a positional argument, then gitleaks will scan the current working directory.
 
 #### Stdin
+
 You can also stream data to gitleaks with the `stdin` command. Example: `cat some_file | gitleaks -v stdin`
 
 ### Creating a baseline
@@ -371,7 +375,7 @@ id = "gitlab-pat"
 
 
 # ⚠️ In v8.25.0 `[allowlist]` was replaced with `[[allowlists]]`.
-# 
+#
 # Global allowlists have a higher order of precedence than rule-specific allowlists.
 # If a commit listed in the `commits` field below is encountered then that commit will be skipped and no
 # secrets will be detected for said commit. The same logic applies for regexes and paths.
@@ -452,8 +456,47 @@ ways:
 Currently supported encodings:
 
 - **percent** - Any printable ASCII percent encoded values
-- **hex** - Any printable ASCII hex encoded values >= 32 characters 
-- **base64** - Any printable ASCII base64 encoded values >= 16 characters 
+- **hex** - Any printable ASCII hex encoded values >= 32 characters
+- **base64** - Any printable ASCII base64 encoded values >= 16 characters
+
+#### Archive Scanning
+
+Sometimes secrets are packaged within archive files like zip files or tarballs,
+making them difficult to discover. Now you can tell gitleaks to automatically
+extract and scan the contents of archives. The flag `--max-archive-depth`
+enables this feature for both `dir` and `git` scan types. The default value of
+"0" means this feature is disabled by default.
+
+Recursive scanning is supported since archives can also contain other archives.
+The `--max-archive-depth` flag sets the recursion limit. Recursion stops when
+there are no new archives to extract, so setting a very high max depth just
+sets the potential to go that deep. It will only go as deep as it needs to.
+
+The findings for secrets located within an archive will include the path to the
+file inside the archive. Inner paths are separated with `!`.
+
+Example finding (shortened for brevity):
+
+```
+Finding:     DB_PASSWORD=8ae31cacf141669ddfb5da
+...
+File:        testdata/archives/nested.tar.gz!archives/files.tar!files/.env.prod
+Line:        4
+Commit:      6e6ee6596d337bb656496425fb98644eb62b4a82
+...
+Fingerprint: 6e6ee6596d337bb656496425fb98644eb62b4a82:testdata/archives/nested.tar.gz!archives/files.tar!files/.env.prod:generic-api-key:4
+Link:        https://github.com/leaktk/gitleaks/blob/6e6ee6596d337bb656496425fb98644eb62b4a82/testdata/archives/nested.tar.gz
+```
+
+This means a secret was detected on line 4 of `files/.env.prod.` which is in
+`archives/files.tar` which is in `testdata/archives/nested.tar.gz`.
+
+Currently supported formats:
+
+The [compression](https://github.com/mholt/archives?tab=readme-ov-file#supported-compression-formats)
+and [archive](https://github.com/mholt/archives?tab=readme-ov-file#supported-archive-formats)
+formats supported by mholt's [archives package](https://github.com/mholt/archives)
+are supported.
 
 #### Reporting
 

+ 40 - 21
cmd/detect.go

@@ -19,13 +19,13 @@
 package cmd
 
 import (
+	"context"
 	"os"
 	"time"
 
 	"github.com/spf13/cobra"
 
 	"github.com/zricethezav/gitleaks/v8/cmd/scm"
-	"github.com/zricethezav/gitleaks/v8/detect"
 	"github.com/zricethezav/gitleaks/v8/logging"
 	"github.com/zricethezav/gitleaks/v8/report"
 	"github.com/zricethezav/gitleaks/v8/sources"
@@ -51,15 +51,15 @@ var detectCmd = &cobra.Command{
 func runDetect(cmd *cobra.Command, args []string) {
 	// start timer
 	start := time.Now()
-	source := mustGetStringFlag(cmd, "source")
+	sourcePath := mustGetStringFlag(cmd, "source")
 
 	// setup config (aka, the thing that defines rules)
-	initConfig(source)
+	initConfig(sourcePath)
 	initDiagnostics()
 	cfg := Config(cmd)
 
 	// create detector
-	detector := Detector(cmd, cfg, source)
+	detector := Detector(cmd, cfg, sourcePath)
 
 	// parse flags
 	detector.FollowSymlinks = mustGetBoolFlag(cmd, "follow-symlinks")
@@ -71,46 +71,65 @@ func runDetect(cmd *cobra.Command, args []string) {
 	// - git: scan the history of the repo
 	// - no-git: scan files by treating the repo as a plain directory
 	var (
-		findings []report.Finding
 		err      error
+		findings []report.Finding
+		ctx      = context.Background()
 	)
 	if noGit {
-		paths, err := sources.DirectoryTargets(
-			source,
-			detector.Sema,
-			detector.FollowSymlinks,
-			detector.Config.Allowlists,
+		findings, err = detector.DetectSource(
+			ctx, &sources.Files{
+				Config:          &cfg,
+				FollowSymlinks:  detector.FollowSymlinks,
+				MaxFileSize:     detector.MaxTargetMegaBytes * 1_000_000,
+				Path:            sourcePath,
+				Sema:            detector.Sema,
+				MaxArchiveDepth: detector.MaxArchiveDepth,
+			},
 		)
-		if err != nil {
-			logging.Fatal().Err(err).Send()
-		}
 
-		if findings, err = detector.DetectFiles(paths); err != nil {
+		if err != nil {
 			// don't exit on error, just log it
-			logging.Error().Err(err).Msg("failed scan directory")
+			logging.Error().Err(err).Msg("failed to scan directory")
 		}
 	} else if fromPipe {
-		if findings, err = detector.DetectReader(os.Stdin, 10); err != nil {
+		findings, err = detector.DetectSource(
+			ctx, &sources.File{
+				Content:         os.Stdin,
+				MaxArchiveDepth: detector.MaxArchiveDepth,
+			},
+		)
+
+		if err != nil {
 			// log fatal to exit, no need to continue since a report
 			// will not be generated when scanning from a pipe...for now
 			logging.Fatal().Err(err).Msg("failed scan input from stdin")
 		}
 	} else {
 		var (
-			logOpts     = mustGetStringFlag(cmd, "log-opts")
 			gitCmd      *sources.GitCmd
 			scmPlatform scm.Platform
-			remote      *detect.RemoteInfo
 		)
-		if gitCmd, err = sources.NewGitLogCmd(source, logOpts); err != nil {
+
+		logOpts := mustGetStringFlag(cmd, "log-opts")
+		if gitCmd, err = sources.NewGitLogCmd(sourcePath, logOpts); err != nil {
 			logging.Fatal().Err(err).Msg("could not create Git cmd")
 		}
+
 		if scmPlatform, err = scm.PlatformFromString(mustGetStringFlag(cmd, "platform")); err != nil {
 			logging.Fatal().Err(err).Send()
 		}
-		remote = detect.NewRemoteInfo(scmPlatform, source)
 
-		if findings, err = detector.DetectGit(gitCmd, remote); err != nil {
+		findings, err = detector.DetectSource(
+			ctx, &sources.Git{
+				Cmd:             gitCmd,
+				Config:          &detector.Config,
+				Remote:          sources.NewRemoteInfo(scmPlatform, sourcePath),
+				Sema:            detector.Sema,
+				MaxArchiveDepth: detector.MaxArchiveDepth,
+			},
+		)
+
+		if err != nil {
 			// don't exit on error, just log it
 			logging.Error().Err(err).Msg("failed to scan Git repository")
 		}

+ 30 - 0
cmd/diagnostics.go

@@ -1,7 +1,10 @@
 package cmd
 
 import (
+	"errors"
 	"fmt"
+	"net/http"
+	_ "net/http/pprof"
 	"os"
 	"path/filepath"
 	"runtime"
@@ -34,6 +37,14 @@ func NewDiagnosticsManager(diagnosticsFlag string, diagnosticsDir string) (*Diag
 		OutputDir: diagnosticsDir,
 	}
 
+	if diagnosticsFlag == "http" {
+		if len(diagnosticsDir) != 0 {
+			return nil, errors.New("the diagnostics directory should not be set in http mode")
+		}
+
+		return dm, nil
+	}
+
 	// If no output directory is specified, use the current directory
 	if dm.OutputDir == "" {
 		var err error
@@ -87,6 +98,10 @@ func (dm *DiagnosticsManager) StartDiagnostics() error {
 			if err = dm.StartTraceProfile(); err != nil {
 				return err
 			}
+		case "http":
+			if err = dm.StartHttpHandler(); err != nil {
+				return err
+			}
 		default:
 			logging.Warn().Msgf("Unknown diagnostics type: %s", diagType)
 		}
@@ -112,10 +127,25 @@ func (dm *DiagnosticsManager) StopDiagnostics() {
 			dm.WriteMemoryProfile()
 		case "trace":
 			dm.StopTraceProfile()
+		case "http":
+			// No need to stop the http one
 		}
 	}
 }
 
+func (dm *DiagnosticsManager) StartHttpHandler() error {
+	if len(dm.DiagTypes) > 1 {
+		return errors.New("other diagnostics modes should not be enabled when http mode is enabled")
+	}
+
+	go func() {
+		logging.Error().Err(http.ListenAndServe("localhost:6060", nil)).Send()
+	}()
+
+	logging.Info().Str("url", "http://localhost:6060/debug/pprof/").Msg("Diagnostics server started")
+	return nil
+}
+
 // StartCPUProfile starts CPU profiling
 func (dm *DiagnosticsManager) StartCPUProfile() error {
 	cpuProfilePath := filepath.Join(dm.OutputDir, "cpu.pprof")

+ 14 - 18
cmd/directory.go

@@ -1,12 +1,12 @@
 package cmd
 
 import (
+	"context"
 	"time"
 
 	"github.com/spf13/cobra"
 
 	"github.com/zricethezav/gitleaks/v8/logging"
-	"github.com/zricethezav/gitleaks/v8/report"
 	"github.com/zricethezav/gitleaks/v8/sources"
 )
 
@@ -34,11 +34,7 @@ func runDirectory(cmd *cobra.Command, args []string) {
 
 	initConfig(source)
 	initDiagnostics()
-
-	var (
-		findings []report.Finding
-		err      error
-	)
+	var err error
 
 	// setup config (aka, the thing that defines rules)
 	cfg := Config(cmd)
@@ -50,28 +46,28 @@ func runDirectory(cmd *cobra.Command, args []string) {
 
 	// set follow symlinks flag
 	if detector.FollowSymlinks, err = cmd.Flags().GetBool("follow-symlinks"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
+
 	// set exit code
 	exitCode, err := cmd.Flags().GetInt("exit-code")
 	if err != nil {
 		logging.Fatal().Err(err).Msg("could not get exit code")
 	}
 
-	var paths <-chan sources.ScanTarget
-	paths, err = sources.DirectoryTargets(
-		source,
-		detector.Sema,
-		detector.FollowSymlinks,
-		detector.Config.Allowlists,
+	findings, err := detector.DetectSource(
+		context.Background(),
+		&sources.Files{
+			Config:          &cfg,
+			FollowSymlinks:  detector.FollowSymlinks,
+			MaxFileSize:     detector.MaxTargetMegaBytes * 1_000_000,
+			Path:            source,
+			Sema:            detector.Sema,
+			MaxArchiveDepth: detector.MaxArchiveDepth,
+		},
 	)
-	if err != nil {
-		logging.Fatal().Err(err)
-	}
 
-	findings, err = detector.DetectFiles(paths)
 	if err != nil {
-		// don't exit on error, just log it
 		logging.Error().Err(err).Msg("failed scan directory")
 	}
 

+ 4 - 2
cmd/generate/config/base/config.go

@@ -66,7 +66,7 @@ func CreateGlobalConfig() config.Config {
 					// ----------- Documents and media -----------
 					regexp.MustCompile(`(?i)\.(?:bmp|gif|jpe?g|png|svg|tiff?)$`), // Images
 					regexp.MustCompile(`(?i)\.(?:eot|[ot]tf|woff2?)$`),           // Fonts
-					regexp.MustCompile(`(?i)\.(?:docx?|xlsx?|pdf|bin|socket|vsidx|v2|suo|wsuo|.dll|pdb|exe|gltf|zip)$`),
+					regexp.MustCompile(`(?i)\.(?:docx?|xlsx?|pdf|bin|socket|vsidx|v2|suo|wsuo|.dll|pdb|exe|gltf)$`),
 
 					// ----------- Golang files -----------
 					regexp.MustCompile(`go\.(?:mod|sum|work(?:\.sum)?)$`),
@@ -105,7 +105,9 @@ func CreateGlobalConfig() config.Config {
 					// Misc
 					regexp.MustCompile(`verification-metadata\.xml`),
 					regexp.MustCompile(`Database.refactorlog`),
-					// regexp.MustCompile(`vendor`),
+
+					// ----------- Git files ------------
+					regexp.MustCompile(`(?:^|/)\.git$`),
 				},
 				StopWords: []string{
 					"abcdefghijklmnopqrstuvwxyz", // character range

+ 1 - 2
cmd/generate/config/main.go

@@ -2,10 +2,9 @@ package main
 
 import (
 	"os"
+	"slices"
 	"text/template"
 
-	"golang.org/x/exp/slices"
-
 	"github.com/zricethezav/gitleaks/v8/cmd/generate/config/base"
 	"github.com/zricethezav/gitleaks/v8/cmd/generate/config/rules"
 	"github.com/zricethezav/gitleaks/v8/config"

+ 1 - 1
cmd/generate/config/rules/generic.go

@@ -79,7 +79,7 @@ func GenericCredential() *config.Rule {
 
 						// Token
 						`|(?:csrf)[_.-]?token` +
-						`|(?:io\.jsonwebtoken[ \t]?:[ \t]?[\w-]+)` + // Maven library coordinats. (e.g., https://mvnrepository.com/artifact/io.jsonwebtoken/jjwt)
+						`|(?:io\.jsonwebtoken[ \t]?:[ \t]?[\w-]+)` + // Maven library coordinates. (e.g., https://mvnrepository.com/artifact/io.jsonwebtoken/jjwt)
 
 						// General
 						`|(?:api|credentials|token)[_.-]?(?:endpoint|ur[il])` +

+ 2 - 4
cmd/generate/config/rules/plaid.go

@@ -1,8 +1,6 @@
 package rules
 
 import (
-	"fmt"
-
 	"github.com/zricethezav/gitleaks/v8/cmd/generate/config/utils"
 	"github.com/zricethezav/gitleaks/v8/cmd/generate/secrets"
 	"github.com/zricethezav/gitleaks/v8/config"
@@ -50,7 +48,7 @@ func PlaidAccessToken() *config.Rule {
 		RuleID:      "plaid-api-token",
 		Description: "Discovered a Plaid API Token, potentially compromising financial data aggregation and banking services.",
 		Regex: utils.GenerateSemiGenericRegex([]string{"plaid"},
-			fmt.Sprintf("access-(?:sandbox|development|production)-%s", utils.Hex8_4_4_4_12()), true),
+			"access-(?:sandbox|development|production)-"+utils.Hex8_4_4_4_12(), true),
 
 		Keywords: []string{
 			"plaid",
@@ -58,6 +56,6 @@ func PlaidAccessToken() *config.Rule {
 	}
 
 	// validate
-	tps := utils.GenerateSampleSecrets("plaid", secrets.NewSecret(fmt.Sprintf("access-(?:sandbox|development|production)-%s", utils.Hex8_4_4_4_12())))
+	tps := utils.GenerateSampleSecrets("plaid", secrets.NewSecret("access-(?:sandbox|development|production)-"+utils.Hex8_4_4_4_12()))
 	return utils.Validate(r, tps, nil)
 }

+ 16 - 8
cmd/git.go

@@ -1,12 +1,12 @@
 package cmd
 
 import (
+	"context"
 	"time"
 
 	"github.com/spf13/cobra"
 
 	"github.com/zricethezav/gitleaks/v8/cmd/scm"
-	"github.com/zricethezav/gitleaks/v8/detect"
 	"github.com/zricethezav/gitleaks/v8/logging"
 	"github.com/zricethezav/gitleaks/v8/report"
 	"github.com/zricethezav/gitleaks/v8/sources"
@@ -56,19 +56,18 @@ func runGit(cmd *cobra.Command, args []string) {
 	preCommit := mustGetBoolFlag(cmd, "pre-commit")
 
 	var (
-		findings []report.Finding
-		err      error
-
+		findings    []report.Finding
+		err         error
 		gitCmd      *sources.GitCmd
 		scmPlatform scm.Platform
-		remote      *detect.RemoteInfo
 	)
+
 	if preCommit || staged {
 		if gitCmd, err = sources.NewGitDiffCmd(source, staged); err != nil {
 			logging.Fatal().Err(err).Msg("could not create Git diff cmd")
 		}
 		// Remote info + links are irrelevant for staged changes.
-		remote = &detect.RemoteInfo{Platform: scm.NoPlatform}
+		scmPlatform = scm.NoPlatform
 	} else {
 		if gitCmd, err = sources.NewGitLogCmd(source, logOpts); err != nil {
 			logging.Fatal().Err(err).Msg("could not create Git log cmd")
@@ -76,10 +75,19 @@ func runGit(cmd *cobra.Command, args []string) {
 		if scmPlatform, err = scm.PlatformFromString(mustGetStringFlag(cmd, "platform")); err != nil {
 			logging.Fatal().Err(err).Send()
 		}
-		remote = detect.NewRemoteInfo(scmPlatform, source)
 	}
 
-	findings, err = detector.DetectGit(gitCmd, remote)
+	findings, err = detector.DetectSource(
+		context.Background(),
+		&sources.Git{
+			Cmd:             gitCmd,
+			Config:          &detector.Config,
+			Remote:          sources.NewRemoteInfo(scmPlatform, source),
+			Sema:            detector.Sema,
+			MaxArchiveDepth: detector.MaxArchiveDepth,
+		},
+	)
+
 	if err != nil {
 		// don't exit on error, just log it
 		logging.Error().Err(err).Msg("failed to scan Git repository")

+ 14 - 9
cmd/root.go

@@ -76,10 +76,11 @@ func init() {
 	rootCmd.PersistentFlags().StringSlice("enable-rule", []string{}, "only enable specific rules by id")
 	rootCmd.PersistentFlags().StringP("gitleaks-ignore-path", "i", ".", "path to .gitleaksignore file or folder containing one")
 	rootCmd.PersistentFlags().Int("max-decode-depth", 0, "allow recursive decoding up to this depth (default \"0\", no decoding is done)")
+	rootCmd.PersistentFlags().Int("max-archive-depth", 0, "allow scanning into nested archives up to this depth (default \"0\", no archive traversal is done)")
 
 	// Add diagnostics flags
-	rootCmd.PersistentFlags().String("diagnostics", "", "enable diagnostics (comma-separated list: cpu,mem,trace). cpu=CPU profiling, mem=memory profiling, trace=execution tracing")
-	rootCmd.PersistentFlags().String("diagnostics-dir", "", "directory to store diagnostics output files (defaults to current directory)")
+	rootCmd.PersistentFlags().String("diagnostics", "", "enable diagnostics (http OR comma-separated list: cpu,mem,trace). cpu=CPU prof, mem=memory prof, trace=exec tracing, http=serve via net/http/pprof")
+	rootCmd.PersistentFlags().String("diagnostics-dir", "", "directory to store diagnostics output files when not using http mode (defaults to current directory)")
 
 	err := viper.BindPFlag("config", rootCmd.PersistentFlags().Lookup("config"))
 	if err != nil {
@@ -237,12 +238,16 @@ func Detector(cmd *cobra.Command, cfg config.Config, source string) *detect.Dete
 	detector := detect.NewDetector(cfg)
 
 	if detector.MaxDecodeDepth, err = cmd.Flags().GetInt("max-decode-depth"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
+	}
+
+	if detector.MaxArchiveDepth, err = cmd.Flags().GetInt("max-archive-depth"); err != nil {
+		logging.Fatal().Err(err).Send()
 	}
 
 	// set color flag at first
 	if detector.NoColor, err = cmd.Flags().GetBool("no-color"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
 	// also init logger again without color
 	if detector.NoColor {
@@ -253,7 +258,7 @@ func Detector(cmd *cobra.Command, cfg config.Config, source string) *detect.Dete
 	}
 	detector.Config.Path, err = cmd.Flags().GetString("config")
 	if err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
 
 	// if config path is not set, then use the {source}/.gitleaks.toml path.
@@ -263,18 +268,18 @@ func Detector(cmd *cobra.Command, cfg config.Config, source string) *detect.Dete
 	}
 	// set verbose flag
 	if detector.Verbose, err = cmd.Flags().GetBool("verbose"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
 	// set redact flag
 	if detector.Redact, err = cmd.Flags().GetUint("redact"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
 	if detector.MaxTargetMegaBytes, err = cmd.Flags().GetInt("max-target-megabytes"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
 	// set ignore gitleaks:allow flag
 	if detector.IgnoreGitleaksAllow, err = cmd.Flags().GetBool("ignore-gitleaks-allow"); err != nil {
-		logging.Fatal().Err(err).Msg("")
+		logging.Fatal().Err(err).Send()
 	}
 
 	gitleaksIgnorePath, err := cmd.Flags().GetString("gitleaks-ignore-path")

+ 12 - 3
cmd/stdin.go

@@ -1,12 +1,14 @@
 package cmd
 
 import (
+	"context"
 	"os"
 	"time"
 
 	"github.com/spf13/cobra"
 
 	"github.com/zricethezav/gitleaks/v8/logging"
+	"github.com/zricethezav/gitleaks/v8/sources"
 )
 
 func init() {
@@ -35,10 +37,17 @@ func runStdIn(cmd *cobra.Command, _ []string) {
 	// parse flag(s)
 	exitCode := mustGetIntFlag(cmd, "exit-code")
 
-	findings, err := detector.DetectReader(os.Stdin, 10)
+	findings, err := detector.DetectSource(
+		context.Background(),
+		&sources.File{
+			Content:         os.Stdin,
+			MaxArchiveDepth: detector.MaxArchiveDepth,
+		},
+	)
+
 	if err != nil {
-		// log fatal to exit, no need to continue since a report
-		// will not be generated when scanning from a pipe...for now
+		// log fatal to exit, no need to continue since a report will not be
+		// generated when scanning from a pipe...for now
 		logging.Fatal().Err(err).Msg("failed scan input from stdin")
 	}
 

+ 2 - 2
config/allowlist.go

@@ -1,7 +1,7 @@
 package config
 
 import (
-	"fmt"
+	"errors"
 	"strings"
 
 	"golang.org/x/exp/maps"
@@ -69,7 +69,7 @@ func (a *Allowlist) Validate() error {
 		len(a.Paths) == 0 &&
 		len(a.Regexes) == 0 &&
 		len(a.StopWords) == 0 {
-		return fmt.Errorf("must contain at least one check for: commits, paths, regexes, or stopwords")
+		return errors.New("must contain at least one check for: commits, paths, regexes, or stopwords")
 	}
 
 	// Deduplicate commits and stopwords.

+ 2 - 6
config/config.go

@@ -379,9 +379,7 @@ func (c *Config) extend(extensionConfig Config) {
 			}
 			baseRule.Tags = append(baseRule.Tags, currentRule.Tags...)
 			baseRule.Keywords = append(baseRule.Keywords, currentRule.Keywords...)
-			for _, a := range currentRule.Allowlists {
-				baseRule.Allowlists = append(baseRule.Allowlists, a)
-			}
+			baseRule.Allowlists = append(baseRule.Allowlists, currentRule.Allowlists...)
 			// The keywords from the base rule and the extended rule must be merged into the global keywords list
 			for _, k := range baseRule.Keywords {
 				c.Keywords[k] = struct{}{}
@@ -391,9 +389,7 @@ func (c *Config) extend(extensionConfig Config) {
 	}
 
 	// append allowlists, not attempting to merge
-	for _, a := range extensionConfig.Allowlists {
-		c.Allowlists = append(c.Allowlists, a)
-	}
+	c.Allowlists = append(c.Allowlists, extensionConfig.Allowlists...)
 
 	// sort to keep extended rules in order
 	sort.Strings(c.OrderedRules)

+ 11 - 11
config/config_test.go

@@ -1,7 +1,7 @@
 package config
 
 import (
-	"fmt"
+	"errors"
 	"testing"
 
 	"github.com/google/go-cmp/cmp"
@@ -93,17 +93,17 @@ func TestTranslate(t *testing.T) {
 		{
 			cfgName:   "invalid/rule_missing_id",
 			cfg:       Config{},
-			wantError: fmt.Errorf("rule |id| is missing or empty, regex: (?i)(discord[a-z0-9_ .\\-,]{0,25})(=|>|:=|\\|\\|:|<=|=>|:).{0,5}['\\\"]([a-h0-9]{64})['\\\"]"),
+			wantError: errors.New("rule |id| is missing or empty, regex: (?i)(discord[a-z0-9_ .\\-,]{0,25})(=|>|:=|\\|\\|:|<=|=>|:).{0,5}['\\\"]([a-h0-9]{64})['\\\"]"),
 		},
 		{
 			cfgName:   "invalid/rule_no_regex_or_path",
 			cfg:       Config{},
-			wantError: fmt.Errorf("discord-api-key: both |regex| and |path| are empty, this rule will have no effect"),
+			wantError: errors.New("discord-api-key: both |regex| and |path| are empty, this rule will have no effect"),
 		},
 		{
 			cfgName:   "invalid/rule_bad_entropy_group",
 			cfg:       Config{},
-			wantError: fmt.Errorf("discord-api-key: invalid regex secret group 5, max regex secret group 3"),
+			wantError: errors.New("discord-api-key: invalid regex secret group 5, max regex secret group 3"),
 		},
 	}
 	for _, tt := range tests {
@@ -206,22 +206,22 @@ func TestTranslateAllowlists(t *testing.T) {
 		{
 			cfgName:   "invalid/allowlist_global_empty",
 			cfg:       Config{},
-			wantError: fmt.Errorf("[[allowlists]] must contain at least one check for: commits, paths, regexes, or stopwords"),
+			wantError: errors.New("[[allowlists]] must contain at least one check for: commits, paths, regexes, or stopwords"),
 		},
 		{
 			cfgName:   "invalid/allowlist_global_old_and_new",
 			cfg:       Config{},
-			wantError: fmt.Errorf("[allowlist] is deprecated, it cannot be used alongside [[allowlists]]"),
+			wantError: errors.New("[allowlist] is deprecated, it cannot be used alongside [[allowlists]]"),
 		},
 		{
 			cfgName:   "invalid/allowlist_global_target_rule_id",
 			cfg:       Config{},
-			wantError: fmt.Errorf("[[allowlists]] target rule ID 'github-pat' does not exist"),
+			wantError: errors.New("[[allowlists]] target rule ID 'github-pat' does not exist"),
 		},
 		{
 			cfgName:   "invalid/allowlist_global_regextarget",
 			cfg:       Config{},
-			wantError: fmt.Errorf("[[allowlists]] unknown allowlist |regexTarget| 'mtach' (expected 'match', 'line')"),
+			wantError: errors.New("[[allowlists]] unknown allowlist |regexTarget| 'mtach' (expected 'match', 'line')"),
 		},
 
 		// Rule
@@ -302,17 +302,17 @@ func TestTranslateAllowlists(t *testing.T) {
 		{
 			cfgName:   "invalid/allowlist_rule_empty",
 			cfg:       Config{},
-			wantError: fmt.Errorf("example: [[rules.allowlists]] must contain at least one check for: commits, paths, regexes, or stopwords"),
+			wantError: errors.New("example: [[rules.allowlists]] must contain at least one check for: commits, paths, regexes, or stopwords"),
 		},
 		{
 			cfgName:   "invalid/allowlist_rule_old_and_new",
 			cfg:       Config{},
-			wantError: fmt.Errorf("example: [rules.allowlist] is deprecated, it cannot be used alongside [[rules.allowlist]]"),
+			wantError: errors.New("example: [rules.allowlist] is deprecated, it cannot be used alongside [[rules.allowlist]]"),
 		},
 		{
 			cfgName:   "invalid/allowlist_rule_regextarget",
 			cfg:       Config{},
-			wantError: fmt.Errorf("example: [[rules.allowlists]] unknown allowlist |regexTarget| 'mtach' (expected 'match', 'line')"),
+			wantError: errors.New("example: [[rules.allowlists]] unknown allowlist |regexTarget| 'mtach' (expected 'match', 'line')"),
 		},
 	}
 

+ 2 - 1
config/gitleaks.toml

@@ -20,7 +20,7 @@ paths = [
     '''gitleaks\.toml''',
     '''(?i)\.(?:bmp|gif|jpe?g|png|svg|tiff?)$''',
     '''(?i)\.(?:eot|[ot]tf|woff2?)$''',
-    '''(?i)\.(?:docx?|xlsx?|pdf|bin|socket|vsidx|v2|suo|wsuo|.dll|pdb|exe|gltf|zip)$''',
+    '''(?i)\.(?:docx?|xlsx?|pdf|bin|socket|vsidx|v2|suo|wsuo|.dll|pdb|exe|gltf)$''',
     '''go\.(?:mod|sum|work(?:\.sum)?)$''',
     '''(?:^|/)vendor/modules\.txt$''',
     '''(?:^|/)vendor/(?:github\.com|golang\.org/x|google\.golang\.org|gopkg\.in|istio\.io|k8s\.io|sigs\.k8s\.io)(?:/.*)?$''',
@@ -41,6 +41,7 @@ paths = [
     '''\.gem$''',
     '''verification-metadata\.xml''',
     '''Database.refactorlog''',
+    '''(?:^|/)\.git$''',
 ]
 regexes = [
     '''(?i)^true|false|null$''',

+ 3 - 2
config/rule.go

@@ -1,6 +1,7 @@
 package config
 
 import (
+	"errors"
 	"fmt"
 	"strings"
 
@@ -64,12 +65,12 @@ func (r *Rule) Validate() error {
 		} else if r.Description != "" {
 			context = ", description: " + r.Description
 		}
-		return fmt.Errorf("rule |id| is missing or empty" + context)
+		return errors.New("rule |id| is missing or empty" + context)
 	}
 
 	// Ensure the rule actually matches something.
 	if r.Regex == nil && r.Path == nil {
-		return fmt.Errorf("%s: both |regex| and |path| are empty, this rule will have no effect", r.RuleID)
+		return errors.New(r.RuleID + ": both |regex| and |path| are empty, this rule will have no effect")
 	}
 
 	// Ensure |secretGroup| works.

+ 1 - 1
detect/codec/hex.go

@@ -49,7 +49,7 @@ func decodeHex(encodedValue string) string {
 		if n1|n2 == '\xff' {
 			return ""
 		}
-		b := byte(n1<<4 | n2)
+		b := n1<<4 | n2
 		if !printableASCII[b] {
 			return ""
 		}

+ 1 - 1
detect/codec/percent.go

@@ -13,7 +13,7 @@ func decodePercent(encodedValue string) string {
 			n2 := hexMap[encodedValue[encIndex+2]]
 			// Make sure they're hex characters
 			if n1|n2 != '\xff' {
-				b := byte(n1<<4 | n2)
+				b := n1<<4 | n2
 				if !printableASCII[b] {
 					return ""
 				}

+ 1 - 1
detect/codec/start_end.go

@@ -43,7 +43,7 @@ func (s startEnd) overflow(o startEnd) startEnd {
 	return s.merge(o).sub(s)
 }
 
-// merge takes two start/ends and returns a single one that encompases both
+// merge takes two start/ends and returns a single one that encompasses both
 func (s startEnd) merge(o startEnd) startEnd {
 	return startEnd{
 		min(s.start, o.start),

+ 104 - 38
detect/detect.go

@@ -5,7 +5,6 @@ import (
 	"context"
 	"fmt"
 	"os"
-	"runtime"
 	"strings"
 	"sync"
 	"sync/atomic"
@@ -16,6 +15,7 @@ import (
 	"github.com/zricethezav/gitleaks/v8/logging"
 	"github.com/zricethezav/gitleaks/v8/regexp"
 	"github.com/zricethezav/gitleaks/v8/report"
+	"github.com/zricethezav/gitleaks/v8/sources"
 
 	ahocorasick "github.com/BobuSumisu/aho-corasick"
 	"github.com/fatih/semgroup"
@@ -26,8 +26,6 @@ import (
 
 const (
 	gitleaksAllowSignature = "gitleaks:allow"
-	chunkSize              = 100 * 1_000 // 100kb
-
 	// SlowWarningThreshold is the amount of time to wait before logging that a file is slow.
 	// This is useful for identifying problematic files and tuning the allowlist.
 	SlowWarningThreshold = 5 * time.Second
@@ -35,7 +33,6 @@ const (
 
 var (
 	newLineRegexp = regexp.MustCompile("\n")
-	isWindows     = runtime.GOOS == "windows"
 )
 
 // Detector is the main detector struct
@@ -54,6 +51,9 @@ type Detector struct {
 	// MaxDecodeDepths limits how many recursive decoding passes are allowed
 	MaxDecodeDepth int
 
+	// MaxArchiveDepth limits how deep the sources will explore nested archives
+	MaxArchiveDepth int
+
 	// files larger than this will be skipped
 	MaxTargetMegaBytes int
 
@@ -66,6 +66,10 @@ type Detector struct {
 	// IgnoreGitleaksAllow is a flag to ignore gitleaks:allow comments.
 	IgnoreGitleaksAllow bool
 
+	// commitMutex is to prevent concurrent access to the
+	// commit map when adding commits
+	commitMutex *sync.Mutex
+
 	// commitMap is used to keep track of commits that have been scanned.
 	// This is only used for logging purposes and git scans.
 	commitMap map[string]bool
@@ -102,28 +106,10 @@ type Detector struct {
 	TotalBytes atomic.Uint64
 }
 
-// Fragment contains the data to be scanned
-type Fragment struct {
-	// Raw is the raw content of the fragment
-	Raw string
-
-	Bytes []byte
-
-	// FilePath is the path to the file, if applicable.
-	// The path separator MUST be normalized to `/`.
-	FilePath    string
-	SymlinkFile string
-	// WindowsFilePath is the path with the original separator.
-	// This provides a backwards-compatible solution to https://github.com/gitleaks/gitleaks/issues/1565.
-	WindowsFilePath string `json:"-"` // TODO: remove this in v9.
-
-	// CommitSHA is the SHA of the commit if applicable
-	CommitSHA string
-
-	// newlineIndices is a list of indices of newlines in the raw content.
-	// This is used to calculate the line location of a finding
-	newlineIndices [][]int
-}
+// Fragment is an alias for sources.Fragment for backwards compatibility
+//
+// Deprecated: This will be replaced with sources.Fragment in v9
+type Fragment sources.Fragment
 
 // NewDetector creates a new detector with the given config
 func NewDetector(cfg config.Config) *Detector {
@@ -131,6 +117,7 @@ func NewDetector(cfg config.Config) *Detector {
 		commitMap:      make(map[string]bool),
 		gitleaksIgnore: make(map[string]struct{}),
 		findingMutex:   &sync.Mutex{},
+		commitMutex:    &sync.Mutex{},
 		findings:       make([]report.Finding, 0),
 		Config:         cfg,
 		prefilter:      *ahocorasick.NewTrieBuilder().AddStrings(maps.Keys(cfg.Keywords)).Build(),
@@ -158,7 +145,7 @@ func NewDetectorDefaultConfig() (*Detector, error) {
 }
 
 func (d *Detector) AddGitleaksIgnore(gitleaksIgnorePath string) error {
-	logging.Debug().Msgf("found .gitleaksignore file: %s", gitleaksIgnorePath)
+	logging.Debug().Str("path", gitleaksIgnorePath).Msgf("found .gitleaksignore file")
 	file, err := os.Open(gitleaksIgnorePath)
 	if err != nil {
 		return err
@@ -166,7 +153,7 @@ func (d *Detector) AddGitleaksIgnore(gitleaksIgnorePath string) error {
 	defer func() {
 		// https://github.com/securego/gosec/issues/512
 		if err := file.Close(); err != nil {
-			logging.Warn().Msgf("Error closing .gitleaksignore file: %s\n", err)
+			logging.Warn().Err(err).Msgf("Error closing .gitleaksignore file")
 		}
 	}()
 
@@ -211,6 +198,68 @@ func (d *Detector) DetectString(content string) []report.Finding {
 	})
 }
 
+// DetectSource scans the given source and returns a list of findings
+func (d *Detector) DetectSource(ctx context.Context, source sources.Source) ([]report.Finding, error) {
+	err := source.Fragments(ctx, func(fragment sources.Fragment, err error) error {
+		logContext := logging.With()
+
+		if len(fragment.FilePath) > 0 {
+			logContext = logContext.Str("path", fragment.FilePath)
+		}
+
+		if len(fragment.CommitSHA) > 6 {
+			logContext = logContext.Str("commit", fragment.CommitSHA[:7])
+			d.addCommit(fragment.CommitSHA)
+		} else if len(fragment.CommitSHA) > 0 {
+			logContext = logContext.Str("commit", fragment.CommitSHA)
+			d.addCommit(fragment.CommitSHA)
+			logger := logContext.Logger()
+			logger.Warn().Msg("commit SHAs should be >= 7 characters long")
+		}
+
+		logger := logContext.Logger()
+
+		if err != nil {
+			// Log the error and move on to the next fragment
+			logger.Error().Err(err).Send()
+			return nil
+		}
+
+		// both the fragment's content and path should be empty for it to be
+		// considered empty at this point because of path based matches
+		if len(fragment.Raw) == 0 && len(fragment.FilePath) == 0 {
+			logger.Trace().Msg("skipping empty fragment")
+			return nil
+		}
+
+		var timer *time.Timer
+		// Only start the timer in debug mode
+		if logger.GetLevel() <= zerolog.DebugLevel {
+			timer = time.AfterFunc(SlowWarningThreshold, func() {
+				logger.Debug().Msgf("Taking longer than %s to inspect fragment", SlowWarningThreshold.String())
+			})
+		}
+
+		for _, finding := range d.Detect(Fragment(fragment)) {
+			d.AddFinding(finding)
+		}
+
+		// Stop the timer if it was created
+		if timer != nil {
+			timer.Stop()
+		}
+
+		return nil
+	})
+
+	if _, isGit := source.(*sources.Git); isGit {
+		logging.Info().Msgf("%d commits scanned.", len(d.commitMap))
+		logging.Debug().Msg("Note: this number might be smaller than expected due to commits with no additions")
+	}
+
+	return d.Findings(), err
+}
+
 // Detect scans the given fragment and returns a list of findings
 func (d *Detector) Detect(fragment Fragment) []report.Finding {
 	if fragment.Bytes == nil {
@@ -244,7 +293,7 @@ func (d *Detector) Detect(fragment Fragment) []report.Finding {
 	}
 
 	// add newline indices for location calculation in detectRule
-	fragment.newlineIndices = newLineRegexp.FindAllStringIndex(fragment.Raw, -1)
+	newlineIndices := newLineRegexp.FindAllStringIndex(fragment.Raw, -1)
 
 	// setup variables to handle different decoding passes
 	currentRaw := fragment.Raw
@@ -265,14 +314,14 @@ func (d *Detector) Detect(fragment Fragment) []report.Finding {
 			if len(rule.Keywords) == 0 {
 				// if no keywords are associated with the rule always scan the
 				// fragment using the rule
-				findings = append(findings, d.detectRule(fragment, currentRaw, rule, encodedSegments)...)
+				findings = append(findings, d.detectRule(fragment, newlineIndices, currentRaw, rule, encodedSegments)...)
 				continue
 			}
 
 			// check if keywords are in the fragment
 			for _, k := range rule.Keywords {
 				if _, ok := keywords[strings.ToLower(k)]; ok {
-					findings = append(findings, d.detectRule(fragment, currentRaw, rule, encodedSegments)...)
+					findings = append(findings, d.detectRule(fragment, newlineIndices, currentRaw, rule, encodedSegments)...)
 					break
 				}
 			}
@@ -299,7 +348,7 @@ func (d *Detector) Detect(fragment Fragment) []report.Finding {
 }
 
 // detectRule scans the given fragment for the given rule and returns a list of findings
-func (d *Detector) detectRule(fragment Fragment, currentRaw string, r config.Rule, encodedSegments []*codec.EncodedSegment) []report.Finding {
+func (d *Detector) detectRule(fragment Fragment, newlineIndices [][]int, currentRaw string, r config.Rule, encodedSegments []*codec.EncodedSegment) []report.Finding {
 	var (
 		findings []report.Finding
 		logger   = func() zerolog.Logger {
@@ -322,13 +371,21 @@ func (d *Detector) detectRule(fragment Fragment, currentRaw string, r config.Rul
 			// Path _only_ rule
 			if r.Path.MatchString(fragment.FilePath) || (fragment.WindowsFilePath != "" && r.Path.MatchString(fragment.WindowsFilePath)) {
 				finding := report.Finding{
+					Commit:      fragment.CommitSHA,
 					RuleID:      r.RuleID,
 					Description: r.Description,
 					File:        fragment.FilePath,
 					SymlinkFile: fragment.SymlinkFile,
-					Match:       fmt.Sprintf("file detected: %s", fragment.FilePath),
+					Match:       "file detected: " + fragment.FilePath,
 					Tags:        r.Tags,
 				}
+				if fragment.CommitInfo != nil {
+					finding.Author = fragment.CommitInfo.AuthorName
+					finding.Date = fragment.CommitInfo.Date
+					finding.Email = fragment.CommitInfo.AuthorEmail
+					finding.Link = createScmLink(fragment.CommitInfo.Remote, finding)
+					finding.Message = fragment.CommitInfo.Message
+				}
 				return append(findings, finding)
 			}
 		} else {
@@ -348,7 +405,7 @@ func (d *Detector) detectRule(fragment Fragment, currentRaw string, r config.Rul
 
 	// if flag configure and raw data size bigger then the flag
 	if d.MaxTargetMegaBytes > 0 {
-		rawLength := len(currentRaw) / 1000000
+		rawLength := len(currentRaw) / 1_000_000
 		if rawLength > d.MaxTargetMegaBytes {
 			logger.Debug().
 				Int("size", rawLength).
@@ -390,17 +447,18 @@ func (d *Detector) detectRule(fragment Fragment, currentRaw string, r config.Rul
 		// in the finding will be the line/column numbers of the _match_
 		// not the _secret_, which will be different if the secretGroup
 		// value is set for this rule
-		loc := location(fragment, matchIndex)
+		loc := location(newlineIndices, fragment.Raw, matchIndex)
 
 		if matchIndex[1] > loc.endLineIndex {
 			loc.endLineIndex = matchIndex[1]
 		}
 
 		finding := report.Finding{
+			Commit:      fragment.CommitSHA,
 			RuleID:      r.RuleID,
 			Description: r.Description,
-			StartLine:   loc.startLine,
-			EndLine:     loc.endLine,
+			StartLine:   fragment.StartLine + loc.startLine,
+			EndLine:     fragment.StartLine + loc.endLine,
 			StartColumn: loc.startColumn,
 			EndColumn:   loc.endColumn,
 			Line:        fragment.Raw[loc.startLineIndex:loc.endLineIndex],
@@ -410,7 +468,13 @@ func (d *Detector) detectRule(fragment Fragment, currentRaw string, r config.Rul
 			SymlinkFile: fragment.SymlinkFile,
 			Tags:        append(r.Tags, metaTags...),
 		}
-
+		if fragment.CommitInfo != nil {
+			finding.Author = fragment.CommitInfo.AuthorName
+			finding.Date = fragment.CommitInfo.Date
+			finding.Email = fragment.CommitInfo.AuthorEmail
+			finding.Link = createScmLink(fragment.CommitInfo.Remote, finding)
+			finding.Message = fragment.CommitInfo.Message
+		}
 		if !d.IgnoreGitleaksAllow && strings.Contains(finding.Line, gitleaksAllowSignature) {
 			logger.Trace().
 				Str("finding", finding.Secret).
@@ -521,7 +585,9 @@ func (d *Detector) Findings() []report.Finding {
 
 // AddCommit synchronously adds a commit to the commit slice
 func (d *Detector) addCommit(commit string) {
+	d.commitMutex.Lock()
 	d.commitMap[commit] = true
+	d.commitMutex.Unlock()
 }
 
 // checkCommitOrPathAllowed evaluates |fragment| against all provided |allowlists|.

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 930 - 2
detect/detect_test.go


+ 0 - 203
detect/directory.go

@@ -1,203 +0,0 @@
-package detect
-
-import (
-	"bufio"
-	"bytes"
-	"io"
-	"os"
-	"path/filepath"
-	"strings"
-	"time"
-
-	"github.com/h2non/filetype"
-
-	"github.com/zricethezav/gitleaks/v8/logging"
-	"github.com/zricethezav/gitleaks/v8/report"
-	"github.com/zricethezav/gitleaks/v8/sources"
-)
-
-const maxPeekSize = 25 * 1_000 // 10kb
-
-func (d *Detector) DetectFiles(paths <-chan sources.ScanTarget) ([]report.Finding, error) {
-	for pa := range paths {
-		d.Sema.Go(func() error {
-			logger := logging.With().Str("path", pa.Path).Logger()
-			logger.Trace().Msg("Scanning path")
-
-			f, err := os.Open(pa.Path)
-			if err != nil {
-				if os.IsPermission(err) {
-					logger.Warn().Msg("Skipping file: permission denied")
-					return nil
-				}
-				return err
-			}
-			defer func() {
-				_ = f.Close()
-			}()
-
-			// Get file size
-			fileInfo, err := f.Stat()
-			if err != nil {
-				return err
-			}
-			fileSize := fileInfo.Size()
-			if d.MaxTargetMegaBytes > 0 {
-				rawLength := fileSize / 1000000
-				if rawLength > int64(d.MaxTargetMegaBytes) {
-					logger.Debug().
-						Int64("size", rawLength).
-						Msg("Skipping file: exceeds --max-target-megabytes")
-					return nil
-				}
-			}
-
-			var (
-				// Buffer to hold file chunks
-				reader     = bufio.NewReaderSize(f, chunkSize)
-				buf        = make([]byte, chunkSize)
-				totalLines = 0
-			)
-			for {
-				n, err := reader.Read(buf)
-
-				// "Callers should always process the n > 0 bytes returned before considering the error err."
-				// https://pkg.go.dev/io#Reader
-				if n > 0 {
-					// Only check the filetype at the start of file.
-					if totalLines == 0 {
-						// TODO: could other optimizations be introduced here?
-						if mimetype, err := filetype.Match(buf[:n]); err != nil {
-							return nil
-						} else if mimetype.MIME.Type == "application" {
-							return nil // skip binary files
-						}
-					}
-
-					// Try to split chunks across large areas of whitespace, if possible.
-					peekBuf := bytes.NewBuffer(buf[:n])
-					if readErr := readUntilSafeBoundary(reader, n, maxPeekSize, peekBuf); readErr != nil {
-						return readErr
-					}
-
-					// Count the number of newlines in this chunk
-					chunk := peekBuf.String()
-					linesInChunk := strings.Count(chunk, "\n")
-					totalLines += linesInChunk
-					fragment := Fragment{
-						Raw:   chunk,
-						Bytes: peekBuf.Bytes(),
-					}
-					if pa.Symlink != "" {
-						fragment.SymlinkFile = pa.Symlink
-					}
-
-					if isWindows {
-						fragment.FilePath = filepath.ToSlash(pa.Path)
-						fragment.SymlinkFile = filepath.ToSlash(fragment.SymlinkFile)
-						fragment.WindowsFilePath = pa.Path
-					} else {
-						fragment.FilePath = pa.Path
-					}
-
-					timer := time.AfterFunc(SlowWarningThreshold, func() {
-						logger.Debug().Msgf("Taking longer than %s to inspect fragment", SlowWarningThreshold.String())
-					})
-					for _, finding := range d.Detect(fragment) {
-						// need to add 1 since line counting starts at 1
-						finding.StartLine += (totalLines - linesInChunk) + 1
-						finding.EndLine += (totalLines - linesInChunk) + 1
-						d.AddFinding(finding)
-					}
-					if timer != nil {
-						timer.Stop()
-						timer = nil
-					}
-				}
-
-				if err != nil {
-					if err == io.EOF {
-						return nil
-					}
-					return err
-				}
-			}
-		})
-	}
-
-	if err := d.Sema.Wait(); err != nil {
-		return d.findings, err
-	}
-
-	return d.findings, nil
-}
-
-// readUntilSafeBoundary consumes |f| until it finds two consecutive `\n` characters, up to |maxPeekSize|.
-// This hopefully avoids splitting. (https://github.com/gitleaks/gitleaks/issues/1651)
-func readUntilSafeBoundary(r *bufio.Reader, n int, maxPeekSize int, peekBuf *bytes.Buffer) error {
-	if peekBuf.Len() == 0 {
-		return nil
-	}
-
-	// Does the buffer end in consecutive newlines?
-	var (
-		data         = peekBuf.Bytes()
-		lastChar     = data[len(data)-1]
-		newlineCount = 0 // Tracks consecutive newlines
-	)
-	if isWhitespace(lastChar) {
-		for i := len(data) - 1; i >= 0; i-- {
-			lastChar = data[i]
-			if lastChar == '\n' {
-				newlineCount++
-
-				// Stop if two consecutive newlines are found
-				if newlineCount >= 2 {
-					return nil
-				}
-			} else if lastChar == '\r' || lastChar == ' ' || lastChar == '\t' {
-				// The presence of other whitespace characters (`\r`, ` `, `\t`) shouldn't reset the count.
-				// (Intentionally do nothing.)
-			} else {
-				break
-			}
-		}
-	}
-
-	// If not, read ahead until we (hopefully) find some.
-	newlineCount = 0
-	for {
-		data = peekBuf.Bytes()
-		// Check if the last character is a newline.
-		lastChar = data[len(data)-1]
-		if lastChar == '\n' {
-			newlineCount++
-
-			// Stop if two consecutive newlines are found
-			if newlineCount >= 2 {
-				break
-			}
-		} else if lastChar == '\r' || lastChar == ' ' || lastChar == '\t' {
-			// The presence of other whitespace characters (`\r`, ` `, `\t`) shouldn't reset the count.
-			// (Intentionally do nothing.)
-		} else {
-			newlineCount = 0 // Reset if a non-newline character is found
-		}
-
-		// Stop growing the buffer if it reaches maxSize
-		if (peekBuf.Len() - n) >= maxPeekSize {
-			break
-		}
-
-		// Read additional data into a temporary buffer
-		b, err := r.ReadByte()
-		if err != nil {
-			if err == io.EOF {
-				break
-			}
-			return err
-		}
-		peekBuf.WriteByte(b)
-	}
-	return nil
-}

+ 92 - 0
detect/files.go

@@ -0,0 +1,92 @@
+package detect
+
+import (
+	"context"
+	"errors"
+	"os"
+	"sync"
+
+	"github.com/zricethezav/gitleaks/v8/logging"
+	"github.com/zricethezav/gitleaks/v8/report"
+	"github.com/zricethezav/gitleaks/v8/sources"
+)
+
+// DetectFiles runs detections against a chanel of scan targets
+//
+// Deprecated: Use sources.Files and Detector.DetectSource instead
+func (d *Detector) DetectFiles(scanTargets <-chan sources.ScanTarget) ([]report.Finding, error) {
+	var wg sync.WaitGroup
+
+	for scanTarget := range scanTargets {
+		wg.Add(1)
+
+		d.Sema.Go(func() error {
+			defer wg.Done()
+
+			logger := logging.With().Str("path", scanTarget.Path).Logger()
+			logger.Trace().Msg("scanning path")
+
+			f, err := os.Open(scanTarget.Path)
+			if err != nil {
+				if os.IsPermission(err) {
+					err = errors.New("permission denied")
+				}
+
+				logger.Warn().Err(err).Msg("skipping file")
+				return nil
+			}
+			defer func() {
+				_ = f.Close()
+			}()
+
+			info, err := f.Stat()
+			if err != nil {
+				logger.Error().Err(err).Msg("skipping file: could not get info")
+				return nil
+			}
+
+			// Empty; nothing to do here.
+			if info.Size() == 0 {
+				logger.Debug().Msg("skipping empty file")
+				return nil
+			}
+
+			// Too large; nothing to do here.
+			if d.MaxTargetMegaBytes > 0 {
+				rawLength := info.Size() / 1_000_000
+				if rawLength > int64(d.MaxTargetMegaBytes) {
+					logger.Warn().Msgf(
+						"skipping file: too large max_size=%dMB, size=%dMB",
+						d.MaxTargetMegaBytes, rawLength,
+					)
+					return nil
+				}
+			}
+
+			// Convert this to a file source
+			file := sources.File{
+				Content:         f,
+				Path:            scanTarget.Path,
+				Symlink:         scanTarget.Symlink,
+				Config:          &d.Config,
+				MaxArchiveDepth: d.MaxArchiveDepth,
+			}
+
+			ctx := context.Background()
+			return file.Fragments(ctx, func(fragment sources.Fragment, err error) error {
+				if err != nil {
+					logging.Error().Err(err)
+					return nil
+				}
+
+				for _, finding := range d.Detect(Fragment(fragment)) {
+					d.AddFinding(finding)
+				}
+				return nil
+			})
+		})
+	}
+
+	wg.Wait()
+	return d.findings, nil
+}

+ 20 - 177
detect/git.go

@@ -1,192 +1,35 @@
 package detect
 
 import (
-	"bytes"
-	"errors"
-	"fmt"
-	"net/url"
-	"os/exec"
-	"regexp"
-	"strings"
-	"time"
-
-	"github.com/gitleaks/go-gitdiff/gitdiff"
+	"context"
 
 	"github.com/zricethezav/gitleaks/v8/cmd/scm"
-	"github.com/zricethezav/gitleaks/v8/logging"
 	"github.com/zricethezav/gitleaks/v8/report"
 	"github.com/zricethezav/gitleaks/v8/sources"
 )
 
+// RemoteInfo is an alias for sources.RemoteInfo for backwards compatibility
+//
+// Deprecated: This will be replaced with sources.RemoteInfo in v9
+type RemoteInfo sources.RemoteInfo
+
+// DetectGit runs detections against a GitCmd with its remote info
+//
+// Deprecated: Use sources.Git and detector.DetectSource instead
 func (d *Detector) DetectGit(cmd *sources.GitCmd, remote *RemoteInfo) ([]report.Finding, error) {
-	defer cmd.Wait()
-	var (
-		diffFilesCh = cmd.DiffFilesCh()
-		errCh       = cmd.ErrCh()
+	return d.DetectSource(
+		context.Background(),
+		&sources.Git{
+			Cmd:             cmd,
+			Config:          &d.Config,
+			Remote:          (*sources.RemoteInfo)(remote),
+			Sema:            d.Sema,
+			MaxArchiveDepth: d.MaxArchiveDepth,
+		},
 	)
-
-	// loop to range over both DiffFiles (stdout) and ErrCh (stderr)
-	for diffFilesCh != nil || errCh != nil {
-		select {
-		case gitdiffFile, open := <-diffFilesCh:
-			if !open {
-				diffFilesCh = nil
-				break
-			}
-
-			// skip binary files
-			if gitdiffFile.IsBinary || gitdiffFile.IsDelete {
-				continue
-			}
-
-			// Check if commit is allowed
-			commitSHA := ""
-			if gitdiffFile.PatchHeader != nil {
-				commitSHA = gitdiffFile.PatchHeader.SHA
-				for _, a := range d.Config.Allowlists {
-					if ok, c := a.CommitAllowed(gitdiffFile.PatchHeader.SHA); ok {
-						logging.Trace().Str("allowed-commit", c).Msg("skipping commit: global allowlist")
-						continue
-					}
-				}
-			}
-			d.addCommit(commitSHA)
-
-			d.Sema.Go(func() error {
-				for _, textFragment := range gitdiffFile.TextFragments {
-					if textFragment == nil {
-						return nil
-					}
-
-					fragment := Fragment{
-						Raw:       textFragment.Raw(gitdiff.OpAdd),
-						CommitSHA: commitSHA,
-						FilePath:  gitdiffFile.NewName,
-					}
-
-					timer := time.AfterFunc(SlowWarningThreshold, func() {
-						logging.Debug().
-							Str("commit", commitSHA[:7]).
-							Str("path", fragment.FilePath).
-							Msgf("Taking longer than %s to inspect fragment", SlowWarningThreshold.String())
-					})
-					for _, finding := range d.Detect(fragment) {
-						d.AddFinding(augmentGitFinding(remote, finding, textFragment, gitdiffFile))
-					}
-					if timer != nil {
-						timer.Stop()
-						timer = nil
-					}
-				}
-				return nil
-			})
-		case err, open := <-errCh:
-			if !open {
-				errCh = nil
-				break
-			}
-
-			return d.findings, err
-		}
-	}
-
-	if err := d.Sema.Wait(); err != nil {
-		return d.findings, err
-	}
-	logging.Info().Msgf("%d commits scanned.", len(d.commitMap))
-	logging.Debug().Msg("Note: this number might be smaller than expected due to commits with no additions")
-	return d.findings, nil
-}
-
-type RemoteInfo struct {
-	Platform scm.Platform
-	Url      string
 }
 
+// Deprecated: use sources.NewRemoteInfo instead
 func NewRemoteInfo(platform scm.Platform, source string) *RemoteInfo {
-	if platform == scm.NoPlatform {
-		return &RemoteInfo{Platform: platform}
-	}
-
-	remoteUrl, err := getRemoteUrl(source)
-	if err != nil {
-		if strings.Contains(err.Error(), "No remote configured") {
-			logging.Debug().Msg("skipping finding links: repository has no configured remote.")
-			platform = scm.NoPlatform
-		} else {
-			logging.Error().Err(err).Msg("skipping finding links: unable to parse remote URL")
-		}
-		goto End
-	}
-
-	if platform == scm.UnknownPlatform {
-		platform = platformFromHost(remoteUrl)
-		if platform == scm.UnknownPlatform {
-			logging.Info().
-				Str("host", remoteUrl.Hostname()).
-				Msg("Unknown SCM platform. Use --platform to include links in findings.")
-		} else {
-			logging.Debug().
-				Str("host", remoteUrl.Hostname()).
-				Str("platform", platform.String()).
-				Msg("SCM platform parsed from host")
-		}
-	}
-
-End:
-	var rUrl string
-	if remoteUrl != nil {
-		rUrl = remoteUrl.String()
-	}
-	return &RemoteInfo{
-		Platform: platform,
-		Url:      rUrl,
-	}
-}
-
-var sshUrlpat = regexp.MustCompile(`^git@([a-zA-Z0-9.-]+):([\w/.-]+?)(?:\.git)?$`)
-
-func getRemoteUrl(source string) (*url.URL, error) {
-	// This will return the first remote — typically, "origin".
-	cmd := exec.Command("git", "ls-remote", "--quiet", "--get-url")
-	if source != "." {
-		cmd.Dir = source
-	}
-
-	stdout, err := cmd.Output()
-	if err != nil {
-		var exitError *exec.ExitError
-		if errors.As(err, &exitError) {
-			return nil, fmt.Errorf("command failed (%d): %w, stderr: %s", exitError.ExitCode(), err, string(bytes.TrimSpace(exitError.Stderr)))
-		}
-		return nil, err
-	}
-
-	remoteUrl := string(bytes.TrimSpace(stdout))
-	if matches := sshUrlpat.FindStringSubmatch(remoteUrl); matches != nil {
-		remoteUrl = fmt.Sprintf("https://%s/%s", matches[1], matches[2])
-	}
-	remoteUrl = strings.TrimSuffix(remoteUrl, ".git")
-
-	parsedUrl, err := url.Parse(remoteUrl)
-	if err != nil {
-		return nil, fmt.Errorf("unable to parse remote URL: %w", err)
-	}
-
-	// Remove any user info.
-	parsedUrl.User = nil
-	return parsedUrl, nil
-}
-
-func platformFromHost(u *url.URL) scm.Platform {
-	switch strings.ToLower(u.Hostname()) {
-	case "github.com":
-		return scm.GitHubPlatform
-	case "gitlab.com":
-		return scm.GitLabPlatform
-	case "dev.azure.com", "visualstudio.com":
-		return scm.AzureDevOpsPlatform
-	default:
-		return scm.UnknownPlatform
-	}
+	return (*RemoteInfo)(sources.NewRemoteInfo(platform, source))
 }

+ 8 - 8
detect/location.go

@@ -10,7 +10,7 @@ type Location struct {
 	endLineIndex   int
 }
 
-func location(fragment Fragment, matchIndex []int) Location {
+func location(newlineIndices [][]int, raw string, matchIndex []int) Location {
 	var (
 		prevNewLine int
 		location    Location
@@ -28,13 +28,13 @@ func location(fragment Fragment, matchIndex []int) Location {
 	// When a fragment does NOT have any newlines, a default "newline"
 	// will be counted to make the subsequent location calculation logic work
 	// for fragments will no newlines.
-	if len(fragment.newlineIndices) == 0 {
-		fragment.newlineIndices = [][]int{
-			{len(fragment.Raw), len(fragment.Raw) + 1},
+	if len(newlineIndices) == 0 {
+		newlineIndices = [][]int{
+			{len(raw), len(raw) + 1},
 		}
 	}
 
-	for lineNum, pair := range fragment.newlineIndices {
+	for lineNum, pair := range newlineIndices {
 		_lineNum = lineNum
 		newLineByteIndex := pair[0]
 		if prevNewLine <= start && start < newLineByteIndex {
@@ -65,11 +65,11 @@ func location(fragment Fragment, matchIndex []int) Location {
 
 		// search for new line byte index
 		i := 0
-		for end+i < len(fragment.Raw) {
-			if fragment.Raw[end+i] == '\n' {
+		for end+i < len(raw) {
+			if raw[end+i] == '\n' {
 				break
 			}
-			if fragment.Raw[end+i] == '\r' {
+			if raw[end+i] == '\r' {
 				break
 			}
 			i++

+ 1 - 1
detect/location_test.go

@@ -51,7 +51,7 @@ func TestGetLocation(t *testing.T) {
 	}
 
 	for _, test := range tests {
-		loc := location(Fragment{newlineIndices: test.linePairs}, []int{test.start, test.end})
+		loc := location(test.linePairs, "", []int{test.start, test.end})
 		assert.Equal(t, test.wantLocation, loc)
 	}
 }

+ 41 - 60
detect/reader.go

@@ -1,52 +1,41 @@
 package detect
 
 import (
-	"bufio"
-	"bytes"
-	"errors"
+	"context"
 	"io"
 
 	"github.com/zricethezav/gitleaks/v8/report"
+	"github.com/zricethezav/gitleaks/v8/sources"
 )
 
 // DetectReader accepts an io.Reader and a buffer size for the reader in KB
+//
+// Deprecated: Use sources.File with no path defined and Detector.DetectSource instead
 func (d *Detector) DetectReader(r io.Reader, bufSize int) ([]report.Finding, error) {
-	reader := bufio.NewReader(r)
-	buf := make([]byte, 1000*bufSize)
-	findings := []report.Finding{}
-
-	for {
-		n, err := reader.Read(buf)
-
-		// "Callers should always process the n > 0 bytes returned before considering the error err."
-		// https://pkg.go.dev/io#Reader
-		if n > 0 {
-			// Try to split chunks across large areas of whitespace, if possible.
-			peekBuf := bytes.NewBuffer(buf[:n])
-			if readErr := readUntilSafeBoundary(reader, n, maxPeekSize, peekBuf); readErr != nil {
-				return findings, readErr
-			}
+	var findings []report.Finding
+	file := sources.File{
+		Content:         r,
+		Buffer:          make([]byte, 1000*bufSize),
+		MaxArchiveDepth: d.MaxArchiveDepth,
+	}
 
-			fragment := Fragment{
-				Raw: peekBuf.String(),
-			}
-			for _, finding := range d.Detect(fragment) {
-				findings = append(findings, finding)
-				if d.Verbose {
-					printFinding(finding, d.NoColor)
-				}
-			}
+	ctx := context.Background()
+	err := file.Fragments(ctx, func(fragment sources.Fragment, err error) error {
+		if err != nil {
+			return err
 		}
 
-		if err != nil {
-			if err == io.EOF {
-				break
+		for _, finding := range d.Detect(Fragment(fragment)) {
+			findings = append(findings, finding)
+			if d.Verbose {
+				printFinding(finding, d.NoColor)
 			}
-			return findings, err
 		}
-	}
 
-	return findings, nil
+		return nil
+	})
+
+	return findings, err
 }
 
 // StreamDetectReader streams the detection results from the provided io.Reader.
@@ -82,45 +71,37 @@ func (d *Detector) DetectReader(r io.Reader, bufSize int) ([]report.Finding, err
 //	} else {
 //	    fmt.Println("Scanning completed successfully.")
 //	}
+//
+// Deprecated: Use sources.File.Fragments(context.Context, FragmentsFunc) instead
 func (d *Detector) StreamDetectReader(r io.Reader, bufSize int) (<-chan report.Finding, <-chan error) {
 	findingsCh := make(chan report.Finding, 1)
 	errCh := make(chan error, 1)
+	file := sources.File{
+		Content:         r,
+		Buffer:          make([]byte, 1000*bufSize),
+		MaxArchiveDepth: d.MaxArchiveDepth,
+	}
 
 	go func() {
 		defer close(findingsCh)
 		defer close(errCh)
 
-		reader := bufio.NewReader(r)
-		buf := make([]byte, 1000*bufSize)
-
-		for {
-			n, err := reader.Read(buf)
-
-			if n > 0 {
-				peekBuf := bytes.NewBuffer(buf[:n])
-				if readErr := readUntilSafeBoundary(reader, n, maxPeekSize, peekBuf); readErr != nil {
-					errCh <- readErr
-					return
-				}
-
-				fragment := Fragment{Raw: peekBuf.String()}
-				for _, finding := range d.Detect(fragment) {
-					findingsCh <- finding
-					if d.Verbose {
-						printFinding(finding, d.NoColor)
-					}
-				}
+		ctx := context.Background()
+		errCh <- file.Fragments(ctx, func(fragment sources.Fragment, err error) error {
+			if err != nil {
+				return err
 			}
 
-			if err != nil {
-				if errors.Is(err, io.EOF) {
-					errCh <- nil
-					return
+			for _, finding := range d.Detect(Fragment(fragment)) {
+				findingsCh <- finding
+				if d.Verbose {
+					printFinding(finding, d.NoColor)
 				}
-				errCh <- err
-				return
 			}
-		}
+
+			return nil
+		})
+
 	}()
 
 	return findingsCh, errCh

+ 23 - 42
detect/utils.go

@@ -6,59 +6,38 @@ import (
 	"math"
 	"path/filepath"
 	"strings"
-	"time"
 
 	"github.com/zricethezav/gitleaks/v8/cmd/scm"
 	"github.com/zricethezav/gitleaks/v8/logging"
 	"github.com/zricethezav/gitleaks/v8/report"
+	"github.com/zricethezav/gitleaks/v8/sources"
 
 	"github.com/charmbracelet/lipgloss"
-	"github.com/gitleaks/go-gitdiff/gitdiff"
 )
 
-// augmentGitFinding updates the start and end line numbers of a finding to include the
-// delta from the git diff
-func augmentGitFinding(remote *RemoteInfo, finding report.Finding, textFragment *gitdiff.TextFragment, f *gitdiff.File) report.Finding {
-	if !strings.HasPrefix(finding.Match, "file detected") {
-		finding.StartLine += int(textFragment.NewPosition)
-		finding.EndLine += int(textFragment.NewPosition)
-	}
-
-	if f.PatchHeader != nil {
-		finding.Commit = f.PatchHeader.SHA
-		if f.PatchHeader.Author != nil {
-			finding.Author = f.PatchHeader.Author.Name
-			finding.Email = f.PatchHeader.Author.Email
-		}
-		finding.Date = f.PatchHeader.AuthorDate.UTC().Format(time.RFC3339)
-		finding.Message = f.PatchHeader.Message()
-		// Results from `git diff` shouldn't have a link.
-		if finding.Commit != "" {
-			finding.Link = createScmLink(remote.Platform, remote.Url, finding)
-		}
-	}
-	return finding
-}
-
 var linkCleaner = strings.NewReplacer(
 	" ", "%20",
 	"%", "%25",
 )
 
-func createScmLink(scmPlatform scm.Platform, remoteUrl string, finding report.Finding) string {
-	if scmPlatform == scm.UnknownPlatform || scmPlatform == scm.NoPlatform {
+func createScmLink(remote *sources.RemoteInfo, finding report.Finding) string {
+	if remote.Platform == scm.UnknownPlatform ||
+		remote.Platform == scm.NoPlatform ||
+		finding.Commit == "" {
 		return ""
 	}
 
 	// Clean the path.
-	var (
-		filePath = linkCleaner.Replace(finding.File)
-		ext      = strings.ToLower(filepath.Ext(filePath))
-	)
+	filePath, _, hasInnerPath := strings.Cut(finding.File, sources.InnerPathSeparator)
+	filePath = linkCleaner.Replace(filePath)
 
-	switch scmPlatform {
+	switch remote.Platform {
 	case scm.GitHubPlatform:
-		link := fmt.Sprintf("%s/blob/%s/%s", remoteUrl, finding.Commit, filePath)
+		link := fmt.Sprintf("%s/blob/%s/%s", remote.Url, finding.Commit, filePath)
+		if hasInnerPath {
+			return link
+		}
+		ext := strings.ToLower(filepath.Ext(filePath))
 		if ext == ".ipynb" || ext == ".md" {
 			link += "?plain=1"
 		}
@@ -70,7 +49,10 @@ func createScmLink(scmPlatform scm.Platform, remoteUrl string, finding report.Fi
 		}
 		return link
 	case scm.GitLabPlatform:
-		link := fmt.Sprintf("%s/blob/%s/%s", remoteUrl, finding.Commit, filePath)
+		link := fmt.Sprintf("%s/blob/%s/%s", remote.Url, finding.Commit, filePath)
+		if hasInnerPath {
+			return link
+		}
 		if finding.StartLine != 0 {
 			link += fmt.Sprintf("#L%d", finding.StartLine)
 		}
@@ -79,8 +61,11 @@ func createScmLink(scmPlatform scm.Platform, remoteUrl string, finding report.Fi
 		}
 		return link
 	case scm.AzureDevOpsPlatform:
-		link := fmt.Sprintf("%s/commit/%s?path=/%s", remoteUrl, finding.Commit, filePath)
+		link := fmt.Sprintf("%s/commit/%s?path=/%s", remote.Url, finding.Commit, filePath)
 		// Add line information if applicable
+		if hasInnerPath {
+			return link
+		}
 		if finding.StartLine != 0 {
 			link += fmt.Sprintf("&line=%d", finding.StartLine)
 		}
@@ -133,8 +118,8 @@ func filter(findings []report.Finding, redact uint) []report.Finding {
 					strings.Contains(fPrime.Secret, f.Secret) &&
 					!strings.Contains(strings.ToLower(fPrime.RuleID), "generic") {
 
-					genericMatch := strings.Replace(f.Match, f.Secret, "REDACTED", -1)
-					betterMatch := strings.Replace(fPrime.Match, fPrime.Secret, "REDACTED", -1)
+					genericMatch := strings.ReplaceAll(f.Match, f.Secret, "REDACTED")
+					betterMatch := strings.ReplaceAll(fPrime.Match, fPrime.Secret, "REDACTED")
 					logging.Trace().Msgf("skipping %s finding (%s), %s rule takes precedence (%s)", f.RuleID, genericMatch, fPrime.RuleID, betterMatch)
 					include = false
 					break
@@ -243,7 +228,3 @@ func printFinding(f report.Finding, noColor bool) {
 	}
 	fmt.Println("")
 }
-
-func isWhitespace(ch byte) bool {
-	return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'
-}

+ 42 - 23
detect/utils_test.go

@@ -7,25 +7,30 @@ import (
 
 	"github.com/zricethezav/gitleaks/v8/cmd/scm"
 	"github.com/zricethezav/gitleaks/v8/report"
+	"github.com/zricethezav/gitleaks/v8/sources"
 )
 
 func Test_createScmLink(t *testing.T) {
 	tests := map[string]struct {
-		platform scm.Platform
-		url      string
-		finding  report.Finding
-		want     string
+		remote  *sources.RemoteInfo
+		finding report.Finding
+		want    string
 	}{
 		// None
 		"no platform": {
-			platform: scm.NoPlatform,
-			want:     "",
+			remote: &sources.RemoteInfo{
+				Platform: scm.NoPlatform,
+				Url:      "",
+			},
+			want: "",
 		},
 
 		// GitHub
 		"github - single line": {
-			platform: scm.GitHubPlatform,
-			url:      "https://github.com/gitleaks/test",
+			remote: &sources.RemoteInfo{
+				Platform: scm.GitHubPlatform,
+				Url:      "https://github.com/gitleaks/test",
+			},
 			finding: report.Finding{
 				Commit:    "20553ad96a4a080c94a54d677db97eed8ce2560d",
 				File:      "metrics/% of sales/.env",
@@ -35,8 +40,10 @@ func Test_createScmLink(t *testing.T) {
 			want: "https://github.com/gitleaks/test/blob/20553ad96a4a080c94a54d677db97eed8ce2560d/metrics/%25%20of%20sales/.env#L25",
 		},
 		"github - multi line": {
-			platform: scm.GitHubPlatform,
-			url:      "https://github.com/gitleaks/test",
+			remote: &sources.RemoteInfo{
+				Platform: scm.GitHubPlatform,
+				Url:      "https://github.com/gitleaks/test",
+			},
 			finding: report.Finding{
 				Commit:    "7bad9f7654cf9701b62400281748c0e8efd97666",
 				File:      "config.json",
@@ -46,8 +53,10 @@ func Test_createScmLink(t *testing.T) {
 			want: "https://github.com/gitleaks/test/blob/7bad9f7654cf9701b62400281748c0e8efd97666/config.json#L235-L238",
 		},
 		"github - markdown": {
-			platform: scm.GitHubPlatform,
-			url:      "https://github.com/gitleaks/test",
+			remote: &sources.RemoteInfo{
+				Platform: scm.GitHubPlatform,
+				Url:      "https://github.com/gitleaks/test",
+			},
 			finding: report.Finding{
 				Commit:    "1fc8961d172f39ffb671766e472aa76f8d713e87",
 				File:      "docs/guides/ecosystem/discordjs.MD",
@@ -57,8 +66,10 @@ func Test_createScmLink(t *testing.T) {
 			want: "https://github.com/gitleaks/test/blob/1fc8961d172f39ffb671766e472aa76f8d713e87/docs/guides/ecosystem/discordjs.MD?plain=1#L34",
 		},
 		"github - jupyter notebook": {
-			platform: scm.GitHubPlatform,
-			url:      "https://github.com/gitleaks/test",
+			remote: &sources.RemoteInfo{
+				Platform: scm.GitHubPlatform,
+				Url:      "https://github.com/gitleaks/test",
+			},
 			finding: report.Finding{
 				Commit:    "8f56bd2369595bcadbb007e88ba294630fb05c7b",
 				File:      "Cloud/IPYNB/Overlapping Recommendation algorithm _OCuLaR_.ipynb",
@@ -70,8 +81,10 @@ func Test_createScmLink(t *testing.T) {
 
 		// GitLab
 		"gitlab - single line": {
-			platform: scm.GitLabPlatform,
-			url:      "https://gitlab.com/example-org/example-group/gitleaks",
+			remote: &sources.RemoteInfo{
+				Platform: scm.GitLabPlatform,
+				Url:      "https://gitlab.com/example-org/example-group/gitleaks",
+			},
 			finding: report.Finding{
 				Commit:    "213ffd1c9bfa906eb4c7731771132c58a4ca0139",
 				File:      ".gitlab-ci.yml",
@@ -81,8 +94,10 @@ func Test_createScmLink(t *testing.T) {
 			want: "https://gitlab.com/example-org/example-group/gitleaks/blob/213ffd1c9bfa906eb4c7731771132c58a4ca0139/.gitlab-ci.yml#L41",
 		},
 		"gitlab - multi line": {
-			platform: scm.GitLabPlatform,
-			url:      "https://gitlab.com/example-org/example-group/gitleaks",
+			remote: &sources.RemoteInfo{
+				Platform: scm.GitLabPlatform,
+				Url:      "https://gitlab.com/example-org/example-group/gitleaks",
+			},
 			finding: report.Finding{
 				Commit:    "63410f74e23a4e51e1f60b9feb073b5d325af878",
 				File:      ".vscode/launchSettings.json",
@@ -94,8 +109,10 @@ func Test_createScmLink(t *testing.T) {
 
 		// Azure DevOps
 		"azuredevops - single line": {
-			platform: scm.AzureDevOpsPlatform,
-			url:      "https://dev.azure.com/exampleorganisation/exampleproject/_git/exampleRepository",
+			remote: &sources.RemoteInfo{
+				Platform: scm.AzureDevOpsPlatform,
+				Url:      "https://dev.azure.com/exampleorganisation/exampleproject/_git/exampleRepository",
+			},
 			finding: report.Finding{
 				Commit:    "20553ad96a4a080c94a54d677db97eed8ce2560d",
 				File:      "examplefile.json",
@@ -107,8 +124,10 @@ func Test_createScmLink(t *testing.T) {
 
 		// Azure DevOps
 		"azuredevops - multi line": {
-			platform: scm.AzureDevOpsPlatform,
-			url:      "https://dev.azure.com/exampleorganisation/exampleproject/_git/exampleRepository",
+			remote: &sources.RemoteInfo{
+				Platform: scm.AzureDevOpsPlatform,
+				Url:      "https://dev.azure.com/exampleorganisation/exampleproject/_git/exampleRepository",
+			},
 			finding: report.Finding{
 				Commit:    "20553ad96a4a080c94a54d677db97eed8ce2560d",
 				File:      "examplefile.json",
@@ -120,7 +139,7 @@ func Test_createScmLink(t *testing.T) {
 	}
 	for name, tt := range tests {
 		t.Run(name, func(t *testing.T) {
-			actual := createScmLink(tt.platform, tt.url, tt.finding)
+			actual := createScmLink(tt.remote, tt.finding)
 			assert.Equal(t, tt.want, actual)
 		})
 	}

+ 19 - 0
go.mod

@@ -12,6 +12,7 @@ require (
 	github.com/gitleaks/go-gitdiff v0.9.1
 	github.com/google/go-cmp v0.6.0
 	github.com/h2non/filetype v1.1.3
+	github.com/mholt/archives v0.1.2
 	github.com/rs/zerolog v1.33.0
 	github.com/spf13/cobra v1.9.1
 	github.com/spf13/viper v1.19.0
@@ -23,26 +24,44 @@ require (
 	dario.cat/mergo v1.0.1 // indirect
 	github.com/Masterminds/goutils v1.1.1 // indirect
 	github.com/Masterminds/semver/v3 v3.3.0 // indirect
+	github.com/STARRY-S/zip v0.2.1 // indirect
+	github.com/andybalholm/brotli v1.1.2-0.20250424173009-453214e765f3 // indirect
 	github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
+	github.com/bodgit/plumbing v1.3.0 // indirect
+	github.com/bodgit/sevenzip v1.6.0 // indirect
+	github.com/bodgit/windows v1.0.1 // indirect
+	github.com/dsnet/compress v0.0.2-0.20230904184137-39efe44ab707 // indirect
 	github.com/google/uuid v1.6.0 // indirect
+	github.com/hashicorp/errwrap v1.1.0 // indirect
+	github.com/hashicorp/go-multierror v1.1.1 // indirect
+	github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
 	github.com/huandu/xstrings v1.5.0 // indirect
+	github.com/klauspost/compress v1.17.11 // indirect
+	github.com/klauspost/pgzip v1.2.6 // indirect
 	github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
 	github.com/mattn/go-colorable v0.1.14 // indirect
 	github.com/mattn/go-isatty v0.0.20 // indirect
 	github.com/mattn/go-runewidth v0.0.14 // indirect
+	github.com/minio/minlz v1.0.0 // indirect
 	github.com/mitchellh/copystructure v1.2.0 // indirect
 	github.com/mitchellh/reflectwalk v1.0.2 // indirect
 	github.com/muesli/reflow v0.2.1-0.20210115123740-9e1d0d53df68 // indirect
 	github.com/muesli/termenv v0.15.1 // indirect
+	github.com/nwaples/rardecode/v2 v2.1.0 // indirect
 	github.com/pelletier/go-toml/v2 v2.2.3 // indirect
+	github.com/pierrec/lz4/v4 v4.1.21 // indirect
 	github.com/rivo/uniseg v0.2.0 // indirect
 	github.com/sagikazarmark/locafero v0.7.0 // indirect
 	github.com/sagikazarmark/slog-shim v0.1.0 // indirect
 	github.com/shopspring/decimal v1.4.0 // indirect
+	github.com/sorairolake/lzip-go v0.3.5 // indirect
 	github.com/sourcegraph/conc v0.3.0 // indirect
 	github.com/tetratelabs/wazero v1.9.0 // indirect
+	github.com/therootcompany/xz v1.0.1 // indirect
+	github.com/ulikunitz/xz v0.5.12 // indirect
 	github.com/wasilibs/wazero-helpers v0.0.0-20240620070341-3dff1577cd52 // indirect
 	go.uber.org/multierr v1.11.0 // indirect
+	go4.org v0.0.0-20230225012048-214862532bf5 // indirect
 	golang.org/x/crypto v0.32.0 // indirect
 )
 

+ 286 - 0
go.sum

@@ -1,22 +1,62 @@
+cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
+cloud.google.com/go v0.34.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
+cloud.google.com/go v0.38.0/go.mod h1:990N+gfupTy94rShfmMCWGDn0LpTmnzTp2qbd1dvSRU=
+cloud.google.com/go v0.44.1/go.mod h1:iSa0KzasP4Uvy3f1mN/7PiObzGgflwredwwASm/v6AU=
+cloud.google.com/go v0.44.2/go.mod h1:60680Gw3Yr4ikxnPRS/oxxkBccT6SA1yMk63TGekxKY=
+cloud.google.com/go v0.45.1/go.mod h1:RpBamKRgapWJb87xiFSdk4g1CME7QZg3uwTez+TSTjc=
+cloud.google.com/go v0.46.3/go.mod h1:a6bKKbmY7er1mI7TEI4lsAkts/mkhTSZK8w33B4RAg0=
+cloud.google.com/go v0.50.0/go.mod h1:r9sluTvynVuxRIOHXQEHMFffphuXHOMZMycpNR5e6To=
+cloud.google.com/go v0.53.0/go.mod h1:fp/UouUEsRkN6ryDKNW/Upv/JBKnv6WDthjR6+vze6M=
+cloud.google.com/go/bigquery v1.0.1/go.mod h1:i/xbL2UlR5RvWAURpBYZTtm/cXjCha9lbfbpx4poX+o=
+cloud.google.com/go/bigquery v1.3.0/go.mod h1:PjpwJnslEMmckchkHFfq+HTD2DmtT67aNFKH1/VBDHE=
+cloud.google.com/go/datastore v1.0.0/go.mod h1:LXYbyblFSglQ5pkeyhO+Qmw7ukd3C+pD7TKLgZqpHYE=
+cloud.google.com/go/pubsub v1.0.1/go.mod h1:R0Gpsv3s54REJCy4fxDixWD93lHJMoZTyQ2kNxGRt3I=
+cloud.google.com/go/pubsub v1.1.0/go.mod h1:EwwdRX2sKPjnvnqCa270oGRyludottCI76h+R3AArQw=
+cloud.google.com/go/storage v1.0.0/go.mod h1:IhtSnM/ZTZV8YYJWCY8RULGVqBDmpoyjwiyrjsg+URw=
+cloud.google.com/go/storage v1.5.0/go.mod h1:tpKbwo567HUNpVclU5sGELwQWBDZ8gh0ZeosJ0Rtdos=
 dario.cat/mergo v1.0.1 h1:Ra4+bf83h2ztPIQYNP99R6m+Y7KfnARDfID+a+vLl4s=
 dario.cat/mergo v1.0.1/go.mod h1:uNxQE+84aUszobStD9th8a29P2fMDhsBdgRYvZOxGmk=
+dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
 github.com/BobuSumisu/aho-corasick v1.0.3 h1:uuf+JHwU9CHP2Vx+wAy6jcksJThhJS9ehR8a+4nPE9g=
 github.com/BobuSumisu/aho-corasick v1.0.3/go.mod h1:hm4jLcvZKI2vRF2WDU1N4p/jpWtpOzp3nLmi9AzX/XE=
+github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
+github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
 github.com/Masterminds/goutils v1.1.1 h1:5nUrii3FMTL5diU80unEVvNevw1nH4+ZV4DSLVJLSYI=
 github.com/Masterminds/goutils v1.1.1/go.mod h1:8cTjp+g8YejhMuvIA5y2vz3BpJxksy863GQaJW2MFNU=
 github.com/Masterminds/semver/v3 v3.3.0 h1:B8LGeaivUe71a5qox1ICM/JLl0NqZSW5CHyL+hmvYS0=
 github.com/Masterminds/semver/v3 v3.3.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
 github.com/Masterminds/sprig/v3 v3.3.0 h1:mQh0Yrg1XPo6vjYXgtf5OtijNAKJRNcTdOOGZe3tPhs=
 github.com/Masterminds/sprig/v3 v3.3.0/go.mod h1:Zy1iXRYNqNLUolqCpL4uhk6SHUMAOSCzdgBfDb35Lz0=
+github.com/STARRY-S/zip v0.2.1 h1:pWBd4tuSGm3wtpoqRZZ2EAwOmcHK6XFf7bU9qcJXyFg=
+github.com/STARRY-S/zip v0.2.1/go.mod h1:xNvshLODWtC4EJ702g7cTYn13G53o1+X9BWnPFpcWV4=
+github.com/andybalholm/brotli v1.1.2-0.20250424173009-453214e765f3 h1:8PmGpDEZl9yDpcdEr6Odf23feCxK3LNUNMxjXg41pZQ=
+github.com/andybalholm/brotli v1.1.2-0.20250424173009-453214e765f3/go.mod h1:05ib4cKhjx3OQYUY22hTVd34Bc8upXjOLL2rKwwZBoA=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
+github.com/bodgit/plumbing v1.3.0 h1:pf9Itz1JOQgn7vEOE7v7nlEfBykYqvUYioC61TwWCFU=
+github.com/bodgit/plumbing v1.3.0/go.mod h1:JOTb4XiRu5xfnmdnDJo6GmSbSbtSyufrsyZFByMtKEs=
+github.com/bodgit/sevenzip v1.6.0 h1:a4R0Wu6/P1o1pP/3VV++aEOcyeBxeO/xE2Y9NSTrr6A=
+github.com/bodgit/sevenzip v1.6.0/go.mod h1:zOBh9nJUof7tcrlqJFv1koWRrhz3LbDbUNngkuZxLMc=
+github.com/bodgit/windows v1.0.1 h1:tF7K6KOluPYygXa3Z2594zxlkbKPAOvqr97etrGNIz4=
+github.com/bodgit/windows v1.0.1/go.mod h1:a6JLwrB4KrTR5hBpp8FI9/9W9jJfeQ2h4XDXU74ZCdM=
+github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
 github.com/charmbracelet/lipgloss v0.5.0 h1:lulQHuVeodSgDez+3rGiuxlPVXSnhth442DATR2/8t8=
 github.com/charmbracelet/lipgloss v0.5.0/go.mod h1:EZLha/HbzEt7cYqdFPovlqy5FZPj0xFhg5SaqxScmgs=
+github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI=
+github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI=
+github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU=
+github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
 github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
 github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
+github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM=
 github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
+github.com/dsnet/compress v0.0.2-0.20230904184137-39efe44ab707 h1:2tV76y6Q9BB+NEBasnqvs7e49aEBFI8ejC89PSnWH+4=
+github.com/dsnet/compress v0.0.2-0.20230904184137-39efe44ab707/go.mod h1:qssHWj60/X5sZFNxpG4HBPDHVqxNm4DfnCKgrbZOT+s=
+github.com/dsnet/golib v0.0.0-20171103203638-1ea166775780/go.mod h1:Lj+Z9rebOhdfkVLjJ8T6VcRQv3SXugXy999NBtR9aFY=
+github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
+github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c=
 github.com/fatih/semgroup v1.2.0 h1:h/OLXwEM+3NNyAdZEpMiH1OzfplU09i2qXPVThGZvyg=
 github.com/fatih/semgroup v1.2.0/go.mod h1:1KAD4iIYfXjE4U13B48VM4z9QUwV5Tt8O4rS879kgm8=
 github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8=
@@ -25,21 +65,71 @@ github.com/fsnotify/fsnotify v1.8.0 h1:dAwr6QBTBZIkG8roQaJjGof0pp0EeF+tNV7YBP3F/
 github.com/fsnotify/fsnotify v1.8.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0=
 github.com/gitleaks/go-gitdiff v0.9.1 h1:ni6z6/3i9ODT685OLCTf+s/ERlWUNWQF4x1pvoNICw0=
 github.com/gitleaks/go-gitdiff v0.9.1/go.mod h1:pKz0X4YzCKZs30BL+weqBIG7mx0jl4tF1uXV9ZyNvrA=
+github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1/go.mod h1:vR7hzQXu2zJy9AVAgeJqvqgH9Q5CA+iKCZ2gyEVpxRU=
+github.com/go-gl/glfw/v3.3/glfw v0.0.0-20191125211704-12ad95a8df72/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
 github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
+github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
+github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
+github.com/golang/groupcache v0.0.0-20191227052852-215e87163ea7/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
+github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
+github.com/golang/mock v1.1.1/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A=
+github.com/golang/mock v1.2.0/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A=
+github.com/golang/mock v1.3.1/go.mod h1:sBzyDLLjw3U8JLTeZvSv8jJB+tU5PVekmnlKIyFUx0Y=
+github.com/golang/mock v1.4.0/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw=
+github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
+github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
+github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
+github.com/golang/protobuf v1.3.3/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw=
+github.com/google/btree v0.0.0-20180813153112-4030bb1f1f0c/go.mod h1:lNA+9X1NB3Zf8V7Ke586lFgjr2dZNuvo3lPJSGZ5JPQ=
+github.com/google/btree v1.0.0/go.mod h1:lNA+9X1NB3Zf8V7Ke586lFgjr2dZNuvo3lPJSGZ5JPQ=
+github.com/google/go-cmp v0.2.0/go.mod h1:oXzfMopK8JAjlY9xF4vHSVASa0yLyX7SntLO5aqRK0M=
+github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
+github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
+github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
+github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
 github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
 github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
+github.com/google/martian v2.1.0+incompatible/go.mod h1:9I4somxYTbIHy5NJKHRl3wXiIaQGbYVAs8BPL6v8lEs=
+github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc=
+github.com/google/pprof v0.0.0-20190515194954-54271f7e092f/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc=
+github.com/google/pprof v0.0.0-20200212024743-f11f1df84d12/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM=
+github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
 github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
 github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
+github.com/googleapis/gax-go/v2 v2.0.4/go.mod h1:0Wqv26UfaUD9n4G6kQubkQ+KchISgw+vpHVxEJEs9eg=
+github.com/googleapis/gax-go/v2 v2.0.5/go.mod h1:DWXyrwAJ9X0FpwwEdw+IPEYBICEFu5mhpdKc/us6bOk=
 github.com/h2non/filetype v1.1.3 h1:FKkx9QbD7HR/zjK1Ia5XiBsq9zdLi5Kf3zGyFTAFkGg=
 github.com/h2non/filetype v1.1.3/go.mod h1:319b3zT68BvV+WRj7cwy856M2ehB3HqNOt6sy1HndBY=
+github.com/hashicorp/errwrap v1.0.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
+github.com/hashicorp/errwrap v1.1.0 h1:OxrOeh75EUXMY8TBjag2fzXGZ40LB6IKw45YeGUDY2I=
+github.com/hashicorp/errwrap v1.1.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
+github.com/hashicorp/go-multierror v1.1.1 h1:H5DkEtf6CXdFp0N0Em5UCwQpXMWke8IA0+lD48awMYo=
+github.com/hashicorp/go-multierror v1.1.1/go.mod h1:iw975J/qwKPdAO1clOe2L8331t/9/fmwbPZ6JB6eMoM=
+github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
+github.com/hashicorp/golang-lru v0.5.1/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
+github.com/hashicorp/golang-lru/v2 v2.0.7 h1:a+bsQ5rvGLjzHuww6tVxozPZFVghXaHOwFs4luLUK2k=
+github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM=
 github.com/hashicorp/hcl v1.0.0 h1:0Anlzjpi4vEasTeNFn2mLJgTSwt0+6sfsiTG8qcWGx4=
 github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ=
 github.com/huandu/xstrings v1.5.0 h1:2ag3IFq9ZDANvthTwTiqSSZLjDc+BedvHPAp5tJy2TI=
 github.com/huandu/xstrings v1.5.0/go.mod h1:y5/lhBue+AyNmUVz9RLU9xbLR0o4KIIExikq4ovT0aE=
+github.com/ianlancetaylor/demangle v0.0.0-20181102032728-5e5cf60278f6/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc=
 github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
 github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
+github.com/jstemmer/go-junit-report v0.0.0-20190106144839-af01ea7f8024/go.mod h1:6v2b51hI/fHJwM22ozAgKL4VKDeJcHhJFhtBdhmNjmU=
+github.com/jstemmer/go-junit-report v0.9.1/go.mod h1:Brl9GWCQeLvo8nXZwPNNblvFj/XSXhF0NWZEnDohbsk=
+github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
+github.com/klauspost/compress v1.4.1/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
+github.com/klauspost/compress v1.17.11 h1:In6xLpyWOi1+C7tXUUWv2ot1QvBjxevKAaI6IXrJmUc=
+github.com/klauspost/compress v1.17.11/go.mod h1:pMDklpSncoRMuLFrf1W9Ss9KT+0rH90U12bZKk7uwG0=
+github.com/klauspost/cpuid v1.2.0/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
+github.com/klauspost/pgzip v1.2.6 h1:8RXeL5crjEUFnR2/Sn6GJNWtSQ3Dk8pq4CL3jvdDyjU=
+github.com/klauspost/pgzip v1.2.6/go.mod h1:Ch1tH69qFZu15pkjo5kYi6mth2Zzwzt50oCQKQE9RUs=
+github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
 github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
 github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
+github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
+github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
 github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
 github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
 github.com/lucasb-eyer/go-colorful v1.2.0 h1:1nnpGOrhyZZuNyfu1QjKiUICQ74+3FNCN69Aj6K7nkY=
@@ -60,6 +150,10 @@ github.com/mattn/go-runewidth v0.0.10/go.mod h1:RAqKPSqVFrSLVXbA8x7dzmKdmGzieGRC
 github.com/mattn/go-runewidth v0.0.13/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
 github.com/mattn/go-runewidth v0.0.14 h1:+xnbZSEeDbOIg5/mE6JF0w6n9duR1l3/WmbinWVwUuU=
 github.com/mattn/go-runewidth v0.0.14/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
+github.com/mholt/archives v0.1.2 h1:UBSe5NfYKHI1sy+S5dJsEsG9jsKKk8NJA4HCC+xTI4A=
+github.com/mholt/archives v0.1.2/go.mod h1:D7QzTHgw3ctfS6wgOO9dN+MFgdZpbksGCxprUOwZWDs=
+github.com/minio/minlz v1.0.0 h1:Kj7aJZ1//LlTP1DM8Jm7lNKvvJS2m74gyyXXn3+uJWQ=
+github.com/minio/minlz v1.0.0/go.mod h1:qT0aEB35q79LLornSzeDH75LBf3aH1MV+jB5w9Wasec=
 github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw=
 github.com/mitchellh/copystructure v1.2.0/go.mod h1:qLl+cE2AmVv+CoeAwDPye/v+N2HKCj9FbZEVFJRxO9s=
 github.com/mitchellh/mapstructure v1.5.0 h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY=
@@ -71,26 +165,36 @@ github.com/muesli/reflow v0.2.1-0.20210115123740-9e1d0d53df68/go.mod h1:Xk+z4oIW
 github.com/muesli/termenv v0.11.1-0.20220204035834-5ac8409525e0/go.mod h1:Bd5NYQ7pd+SrtBSrSNoBBmXlcY8+Xj4BMJgh8qcZrvs=
 github.com/muesli/termenv v0.15.1 h1:UzuTb/+hhlBugQz28rpzey4ZuKcZ03MeKsoG7IJZIxs=
 github.com/muesli/termenv v0.15.1/go.mod h1:HeAQPTzpfs016yGtA4g00CsdYnVLJvxsS4ANqrZs2sQ=
+github.com/nwaples/rardecode/v2 v2.1.0 h1:JQl9ZoBPDy+nIZGb1mx8+anfHp/LV3NE2MjMiv0ct/U=
+github.com/nwaples/rardecode/v2 v2.1.0/go.mod h1:7uz379lSxPe6j9nvzxUZ+n7mnJNgjsRNb6IbvGVHRmw=
 github.com/pelletier/go-toml/v2 v2.2.3 h1:YmeHyLY8mFWbdkNWwpr+qIL2bEqT0o95WSdkNHvL12M=
 github.com/pelletier/go-toml/v2 v2.2.3/go.mod h1:MfCQTFTvCcUyyvvwm1+G6H/jORL20Xlb6rzQu9GuUkc=
+github.com/pierrec/lz4/v4 v4.1.21 h1:yOVMLb6qSIDP67pl/5F7RepeKYu/VmTyEXvuMI5d9mQ=
+github.com/pierrec/lz4/v4 v4.1.21/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4=
 github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
+github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U=
 github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
+github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
 github.com/rivo/uniseg v0.1.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
 github.com/rivo/uniseg v0.2.0 h1:S1pD9weZBuJdFmowNwbpi7BJ8TNftyUImj/0WQi72jY=
 github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
+github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4=
 github.com/rogpeppe/go-internal v1.9.0 h1:73kH8U+JUqXU8lRuOHeVHaa/SZPifC7BkcraZVejAe8=
 github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
 github.com/rs/xid v1.5.0/go.mod h1:trrq9SKmegXys3aeAKXMUTdJsYXVwGY3RLcfgqegfbg=
 github.com/rs/zerolog v1.33.0 h1:1cU2KZkvPxNyfgEmhHAz/1A9Bz+llsdYzklWFzgp0r8=
 github.com/rs/zerolog v1.33.0/go.mod h1:/7mN4D5sKwJLZQ2b/znpjC3/GQWY/xaDXUM0kKWRHss=
 github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
+github.com/rwcarlsen/goexif v0.0.0-20190401172101-9e8deecbddbd/go.mod h1:hPqNNc0+uJM6H+SuU8sEs5K5IQeKccPqeSjfgcKGgPk=
 github.com/sagikazarmark/locafero v0.7.0 h1:5MqpDsTGNDhY8sGp0Aowyf0qKsPrhewaLSsFaodPcyo=
 github.com/sagikazarmark/locafero v0.7.0/go.mod h1:2za3Cg5rMaTMoG/2Ulr9AwtFaIppKXTRYnozin4aB5k=
 github.com/sagikazarmark/slog-shim v0.1.0 h1:diDBnUNK9N/354PgrxMywXnAwEr1QZcOr6gto+ugjYE=
 github.com/sagikazarmark/slog-shim v0.1.0/go.mod h1:SrcSrq8aKtyuqEI1uvTDTK1arOWRIczQRv+GVI1AkeQ=
 github.com/shopspring/decimal v1.4.0 h1:bxl37RwXBklmTi0C79JfXCEBD1cqqHt0bbgBAGFp81k=
 github.com/shopspring/decimal v1.4.0/go.mod h1:gawqmDU56v4yIKSwfBSFip1HdCCXN8/+DMd9qYNcwME=
+github.com/sorairolake/lzip-go v0.3.5 h1:ms5Xri9o1JBIWvOFAorYtUNik6HI3HgBTkISiqu0Cwg=
+github.com/sorairolake/lzip-go v0.3.5/go.mod h1:N0KYq5iWrMXI0ZEXKXaS9hCyOjZUQdBDEIbXfoUwbdk=
 github.com/sourcegraph/conc v0.3.0 h1:OQTbbt6P72L20UqAkXXuLOj79LfEanQ+YQFNpLA9ySo=
 github.com/sourcegraph/conc v0.3.0/go.mod h1:Sdozi7LEKbFPqYX2/J+iBAM6HpqSLTASQIKqDmF7Mt0=
 github.com/spf13/afero v1.12.0 h1:UcOPyRBYczmFn6yvphxkn9ZEOY65cpwGKb5mL36mrqs=
@@ -103,36 +207,218 @@ github.com/spf13/pflag v1.0.6 h1:jFzHGLGAlb3ruxLB8MhbI6A8+AQX/2eW4qeyNZXNp2o=
 github.com/spf13/pflag v1.0.6/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
 github.com/spf13/viper v1.19.0 h1:RWq5SEjt8o25SROyN3z2OrDB9l7RPd3lwTWU8EcEdcI=
 github.com/spf13/viper v1.19.0/go.mod h1:GQUN9bilAbhU/jgc1bKs99f/suXKeUMct8Adx5+Ntkg=
+github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
+github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
+github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
+github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
+github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
+github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
+github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
 github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
 github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
 github.com/subosito/gotenv v1.6.0 h1:9NlTDc1FTs4qu0DDq7AEtTPNw6SVm7uBMsUCUjABIf8=
 github.com/subosito/gotenv v1.6.0/go.mod h1:Dk4QP5c2W3ibzajGcXpNraDfq2IrhjMIvMSWPKKo0FU=
 github.com/tetratelabs/wazero v1.9.0 h1:IcZ56OuxrtaEz8UYNRHBrUa9bYeX9oVY93KspZZBf/I=
 github.com/tetratelabs/wazero v1.9.0/go.mod h1:TSbcXCfFP0L2FGkRPxHphadXPjo1T6W+CseNNY7EkjM=
+github.com/therootcompany/xz v1.0.1 h1:CmOtsn1CbtmyYiusbfmhmkpAAETj0wBIH6kCYaX+xzw=
+github.com/therootcompany/xz v1.0.1/go.mod h1:3K3UH1yCKgBneZYhuQUvJ9HPD19UEXEI0BWbMn8qNMY=
+github.com/ulikunitz/xz v0.5.8/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
+github.com/ulikunitz/xz v0.5.12 h1:37Nm15o69RwBkXM0J6A5OlE67RZTfzUxTj8fB3dfcsc=
+github.com/ulikunitz/xz v0.5.12/go.mod h1:nbz6k7qbPmH4IRqmfOplQw/tblSgqTqBwxkY0oWt/14=
 github.com/wasilibs/go-re2 v1.9.0 h1:kjAd8qbNvV4Ve2Uf+zrpTCrDHtqH4dlsRXktywo73JQ=
 github.com/wasilibs/go-re2 v1.9.0/go.mod h1:0sRtscWgpUdNA137bmr1IUgrRX0Su4dcn9AEe61y+yI=
 github.com/wasilibs/wazero-helpers v0.0.0-20240620070341-3dff1577cd52 h1:OvLBa8SqJnZ6P+mjlzc2K7PM22rRUPE1x32G9DTPrC4=
 github.com/wasilibs/wazero-helpers v0.0.0-20240620070341-3dff1577cd52/go.mod h1:jMeV4Vpbi8osrE/pKUxRZkVaA0EX7NZN0A9/oRzgpgY=
+github.com/xyproto/randomstring v1.0.5 h1:YtlWPoRdgMu3NZtP45drfy1GKoojuR7hmRcnhZqKjWU=
+github.com/xyproto/randomstring v1.0.5/go.mod h1:rgmS5DeNXLivK7YprL0pY+lTuhNQW3iGxZ18UQApw/E=
+github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
+go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=
+go.opencensus.io v0.22.0/go.mod h1:+kGneAE2xo2IficOXnaByMWTGM9T73dGwxeWcUqIpI8=
+go.opencensus.io v0.22.2/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw=
+go.opencensus.io v0.22.3/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw=
 go.uber.org/multierr v1.11.0 h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0=
 go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y=
+go4.org v0.0.0-20230225012048-214862532bf5 h1:nifaUDeh+rPaBCMPMQHZmvJf+QdpLFnuQPwx+LxVmtc=
+go4.org v0.0.0-20230225012048-214862532bf5/go.mod h1:F57wTi5Lrj6WLyswp5EYV1ncrEbFGHD4hhz6S1ZYeaU=
+golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
+golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
+golang.org/x/crypto v0.0.0-20190605123033-f99c8df09eb5/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
+golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
+golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
 golang.org/x/crypto v0.32.0 h1:euUpcYgM8WcP71gNpTqQCn6rC2t6ULUPiOzfWaXVVfc=
 golang.org/x/crypto v0.32.0/go.mod h1:ZnnJkOaASj8g0AjIduWNlq2NRxL0PlBrbKVyZ6V/Ugc=
+golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
+golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
+golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8=
+golang.org/x/exp v0.0.0-20190829153037-c13cbed26979/go.mod h1:86+5VVa7VpoJ4kLfm080zCjGlMRFzhUhsZKEZO7MGek=
+golang.org/x/exp v0.0.0-20191030013958-a1ab85dbe136/go.mod h1:JXzH8nQsPlswgeRAPE3MuO9GYsAcnJvJ4vnMwN/5qkY=
+golang.org/x/exp v0.0.0-20191129062945-2f5052295587/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4=
+golang.org/x/exp v0.0.0-20191227195350-da58074b4299/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4=
+golang.org/x/exp v0.0.0-20200207192155-f17229e696bd/go.mod h1:J/WKrq2StrnmMY6+EHIKF9dgMWnmCNThgcyBT1FY9mM=
 golang.org/x/exp v0.0.0-20250218142911-aa4b98e5adaa h1:t2QcU6V556bFjYgu4L6C+6VrCPyJZ+eyRsABUPs1mz4=
 golang.org/x/exp v0.0.0-20250218142911-aa4b98e5adaa/go.mod h1:BHOTPb3L19zxehTsLoJXVaTktb06DFgmdW6Wb9s8jqk=
+golang.org/x/image v0.0.0-20190227222117-0694c2d4d067/go.mod h1:kZ7UVZpmo3dzQBMxlp+ypCbDeSB+sBbTgSJuh5dn5js=
+golang.org/x/image v0.0.0-20190802002840-cff245a6509b/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
+golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
+golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
+golang.org/x/lint v0.0.0-20190301231843-5614ed5bae6f/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
+golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
+golang.org/x/lint v0.0.0-20190409202823-959b441ac422/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
+golang.org/x/lint v0.0.0-20190909230951-414d861bb4ac/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
+golang.org/x/lint v0.0.0-20190930215403-16217165b5de/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
+golang.org/x/lint v0.0.0-20191125180803-fdd1cda4f05f/go.mod h1:5qLYkcX4OjUUV8bRuDixDT3tpyyb+LUpUlRWLxfhWrs=
+golang.org/x/lint v0.0.0-20200130185559-910be7a94367/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY=
+golang.org/x/mobile v0.0.0-20190312151609-d3739f865fa6/go.mod h1:z+o9i4GpDbdi3rU15maQ/Ox0txvL9dWGYEHz965HBQE=
+golang.org/x/mobile v0.0.0-20190719004257-d2bd2a29d028/go.mod h1:E/iHnbuqvinMTCcRqshq8CkpyQDoeVncDDYHnLhea+o=
+golang.org/x/mod v0.0.0-20190513183733-4bf6d317e70e/go.mod h1:mXi4GBBbnImb6dmsKGUJ2LatrhH/nqhxcFungHvyanc=
+golang.org/x/mod v0.1.0/go.mod h1:0QHyrYULN0/3qlju5TqG8bIK38QM8yzMo5ekMj3DlcY=
+golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg=
+golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
+golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
+golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
+golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
+golang.org/x/net v0.0.0-20190108225652-1e06a53dbb7e/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
+golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
+golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
+golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
+golang.org/x/net v0.0.0-20190501004415-9ce7a6920f09/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
+golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
+golang.org/x/net v0.0.0-20190603091049-60506f45cf65/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks=
+golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
+golang.org/x/net v0.0.0-20190724013045-ca1201d0de80/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
+golang.org/x/net v0.0.0-20191209160850-c0dbc17a3553/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
+golang.org/x/net v0.0.0-20200202094626-16171245cfb2/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
+golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
+golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
+golang.org/x/net v0.7.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
+golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
+golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
+golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
+golang.org/x/oauth2 v0.0.0-20191202225959-858c2ad4c8b6/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
+golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
+golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
+golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
+golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
+golang.org/x/sync v0.0.0-20190227155943-e225da77a7e6/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
+golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
+golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
+golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
 golang.org/x/sync v0.11.0 h1:GGz8+XQP4FvTTrjZPzNKTMFtSXH80RAzG+5ghFPgK9w=
 golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
+golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
+golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
+golang.org/x/sys v0.0.0-20190312061237-fead79001313/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190502145724-3ef323f4f1fd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190507160741-ecd444e8653b/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190606165138-5da285871e9c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190624142023-c5567b49c5d0/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190726091711-fc99dfbffb4e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20191228213918-04cbcbbfeed8/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20200212091648-12a6c2dcc1e4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.30.0 h1:QjkSwP/36a20jFYWkSue1YwXzLmsV5Gfq7Eiy72C1uc=
 golang.org/x/sys v0.30.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
+golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
+golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
+golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
+golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
+golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
+golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
+golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk=
+golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
+golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
+golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
 golang.org/x/text v0.22.0 h1:bofq7m3/HAFvbF51jz3Q9wLg3jkvSPuiZu/pD1XwgtM=
 golang.org/x/text v0.22.0/go.mod h1:YRoo4H8PVmsu+E3Ou7cqLVH8oXWIHVoX0jqUWALQhfY=
+golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
+golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
+golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
+golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
+golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY=
+golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
+golang.org/x/tools v0.0.0-20190312151545-0bb0c0a6e846/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
+golang.org/x/tools v0.0.0-20190312170243-e65039ee4138/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
+golang.org/x/tools v0.0.0-20190425150028-36563e24a262/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
+golang.org/x/tools v0.0.0-20190506145303-2d16b83fe98c/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
+golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
+golang.org/x/tools v0.0.0-20190606124116-d0a3d012864b/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc=
+golang.org/x/tools v0.0.0-20190621195816-6e04913cbbac/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc=
+golang.org/x/tools v0.0.0-20190628153133-6cdbf07be9d0/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc=
+golang.org/x/tools v0.0.0-20190816200558-6889da9d5479/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20190911174233-4f2ddba30aff/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20191012152004-8de300cfc20a/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20191113191852-77e3bb0ad9e7/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20191115202509-3a792d9c32b2/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20191125144606-a911d9008d1f/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
+golang.org/x/tools v0.0.0-20191216173652-a0e659d51361/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
+golang.org/x/tools v0.0.0-20191227053925-7b8e75db28f4/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
+golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
+golang.org/x/tools v0.0.0-20200207183749-b753a1ba74fa/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
+golang.org/x/tools v0.0.0-20200212150539-ea181f53ac56/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
+golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
+golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
+golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
+golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
+google.golang.org/api v0.4.0/go.mod h1:8k5glujaEP+g9n7WNsDg8QP6cUVNI86fCNMcbazEtwE=
+google.golang.org/api v0.7.0/go.mod h1:WtwebWUNSVBH/HAw79HIFXZNqEvBhG+Ra+ax0hx3E3M=
+google.golang.org/api v0.8.0/go.mod h1:o4eAsZoiT+ibD93RtjEohWalFOjRDx6CVaqeizhEnKg=
+google.golang.org/api v0.9.0/go.mod h1:o4eAsZoiT+ibD93RtjEohWalFOjRDx6CVaqeizhEnKg=
+google.golang.org/api v0.13.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI=
+google.golang.org/api v0.14.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI=
+google.golang.org/api v0.15.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI=
+google.golang.org/api v0.17.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE=
+google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
+google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
+google.golang.org/appengine v1.5.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
+google.golang.org/appengine v1.6.1/go.mod h1:i06prIuMbXzDqacNJfV5OdTW448YApPu5ww/cMBSeb0=
+google.golang.org/appengine v1.6.5/go.mod h1:8WjMMxjGQR8xUklV/ARdw2HLXBOI7O7uCIDZVag1xfc=
+google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
+google.golang.org/genproto v0.0.0-20190307195333-5fe7a883aa19/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE=
+google.golang.org/genproto v0.0.0-20190418145605-e7d98fc518a7/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE=
+google.golang.org/genproto v0.0.0-20190425155659-357c62f0e4bb/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE=
+google.golang.org/genproto v0.0.0-20190502173448-54afdca5d873/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE=
+google.golang.org/genproto v0.0.0-20190801165951-fa694d86fc64/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc=
+google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc=
+google.golang.org/genproto v0.0.0-20190911173649-1774047e7e51/go.mod h1:IbNlFCBrqXvoKpeg0TB2l7cyZUmoaFKYIwrEpbDKLA8=
+google.golang.org/genproto v0.0.0-20191108220845-16a3f7862a1a/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc=
+google.golang.org/genproto v0.0.0-20191115194625-c23dd37a84c9/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc=
+google.golang.org/genproto v0.0.0-20191216164720-4f79533eabd1/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc=
+google.golang.org/genproto v0.0.0-20191230161307-f3c370f40bfb/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc=
+google.golang.org/genproto v0.0.0-20200212174721-66ed5ce911ce/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c=
+google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
+google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38=
+google.golang.org/grpc v1.21.1/go.mod h1:oYelfM1adQP15Ek0mdvEgi9Df8B9CZIaU1084ijfRaM=
+google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg=
+google.golang.org/grpc v1.26.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
+google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
+google.golang.org/grpc v1.27.1/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
+gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 h1:YR8cESwS4TdDjEe65xsg0ogRM/Nc3DYOhEAlW+xobZo=
 gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
+gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI=
 gopkg.in/ini.v1 v1.67.0 h1:Dgnx+6+nfE+IfzjUEISNeydPJh9AXNNsWbGP9KzCsOA=
 gopkg.in/ini.v1 v1.67.0/go.mod h1:pNLf8WUiyNEtQjuu5G5vTm06TEv9tsIgeAvK8hOrP4k=
+gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
+gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
 gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
 gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
+honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
+honnef.co/go/tools v0.0.0-20190106161140-3f1c8253044a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
+honnef.co/go/tools v0.0.0-20190418001031-e561f6794a2a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
+honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
+honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg=
+rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
+rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0=
+rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA=

+ 2 - 2
report/finding.go

@@ -50,8 +50,8 @@ func (f *Finding) Redact(percent uint) {
 	if percent >= 100 {
 		secret = "REDACTED"
 	}
-	f.Line = strings.Replace(f.Line, f.Secret, secret, -1)
-	f.Match = strings.Replace(f.Match, f.Secret, secret, -1)
+	f.Line = strings.ReplaceAll(f.Line, f.Secret, secret)
+	f.Match = strings.ReplaceAll(f.Match, f.Secret, secret)
 	f.Secret = secret
 }
 

+ 2 - 1
report/template.go

@@ -1,6 +1,7 @@
 package report
 
 import (
+	"errors"
 	"fmt"
 	"io"
 	"os"
@@ -17,7 +18,7 @@ var _ Reporter = (*TemplateReporter)(nil)
 
 func NewTemplateReporter(templatePath string) (*TemplateReporter, error) {
 	if templatePath == "" {
-		return nil, fmt.Errorf("template path cannot be empty")
+		return nil, errors.New("template path cannot be empty")
 	}
 
 	file, err := os.ReadFile(templatePath)

+ 80 - 0
scripts/profile.sh

@@ -0,0 +1,80 @@
+#! /usr/bin/env bash
+# NAME
+#     profile.sh - generate gitleaks profile data
+#
+# USAGE
+#     profile.sh <gitleaks-path> <benchmark-repo-path>
+#
+# DESCRIPTION
+#     Generates profile data for tuning gitleaks performance under ./profile/<timestamp>
+#
+#			Options:
+#     	<gitleaks-path>       - gitleaks binary to profile
+#     	<benchmark-repo-path> - git repo to run profile against
+#
+# SEE ALSO
+#     Dave Cheney GopherCon 2019 talk on go profiling:
+#
+#     https://www.youtube.com/watch?v=nok0aYiGiYA
+#
+set -euo pipefail
+gitleaks_path="$1"
+test_repo_path="$2"
+base_scan_cmd="${gitleaks_path} --exit-code=0 --max-decode-depth 8"
+base_profile_dir="profile/$(date +%s)"
+
+log() {
+	echo >&2 "$@"
+}
+
+log '========================================================================'
+log 'generating profile data'
+log '------------------------------------------------------------------------'
+# Warm up the fs and also get benchmark data
+for scan_mode in dir git
+do
+	profile_dir="${base_profile_dir}/${scan_mode}"
+	scan_cmd="${base_scan_cmd} ${scan_mode} ${test_repo_path}"
+	mkdir -p "${profile_dir}"
+
+	echo "- mode: ${scan_mode}"
+	# include hyperfine benchmrak results if hyperfine is installed
+	if command -v hyperfine > /dev/null
+	then
+		export_path="${profile_dir}/benchmark.json"
+		echo "  benchmark:"
+		echo "    tool: hyperfine"
+		hyperfine -w 3 --export-json "${export_path}" "${scan_cmd}" &> /dev/null
+		echo "    path: ${export_path}"
+		# Show the results if we can :D
+		if command -v yq > /dev/null
+		then
+			 echo "    results:"
+			 yq -P -oy \
+				 '.results[] | pick(["mean","stddev","median","user","system","min","max"])' \
+				  ${export_path} | sed 's/^/      /g'
+		else
+			echo "    view: ${PAGER:-less} ${export_path}"
+		fi
+	fi
+
+	# generate profile data
+	echo "  profile:"
+	for profile_mode in cpu mem trace
+	do
+		# Generate diagnostics data
+		${scan_cmd} --diagnostics=$profile_mode --diagnostics-dir="${profile_dir}" &> /dev/null
+		profile_file="$(find "${profile_dir}" -type f -name "${profile_mode}*")"
+		echo "    - mode: ${profile_mode}"
+		echo "      path: ${profile_file}"
+		if [[ "${profile_mode}" = "trace" ]]
+		then
+			echo "      view: go tool trace ${profile_file}"
+		else
+			echo "      view: go tool pprof -http=localhost: ${gitleaks_path} ${profile_file}"
+		fi
+	done
+done
+log '------------------------------------------------------------------------'
+log "results in: ${base_profile_dir}"
+log '========================================================================'

+ 125 - 0
sources/common.go

@@ -0,0 +1,125 @@
+package sources
+
+import (
+	"bufio"
+	"bytes"
+	"context"
+	"io"
+	"path/filepath"
+	"runtime"
+
+	"github.com/mholt/archives"
+	"github.com/zricethezav/gitleaks/v8/config"
+	"github.com/zricethezav/gitleaks/v8/logging"
+)
+
+const maxPeekSize = 25 * 1_000 // 10kb
+var isWhitespace [256]bool
+var isWindows = runtime.GOOS == "windows"
+
+func init() {
+	// define whitespace characters
+	isWhitespace[' '] = true
+	isWhitespace['\t'] = true
+	isWhitespace['\n'] = true
+	isWhitespace['\r'] = true
+}
+
+// isArchive does a light check to see if the provided path is an archive or
+// compressed file. The File source already does this, so this exists mainly
+// to avoid expensive calls before sending things to the File source
+func isArchive(ctx context.Context, path string) bool {
+	format, _, err := archives.Identify(ctx, path, nil)
+	return err == nil && format != nil
+}
+
+// shouldSkipPath checks a path against all the allowlists to see if it can
+// be skipped
+func shouldSkipPath(cfg *config.Config, path string) bool {
+	if cfg == nil {
+		logging.Trace().Str("path", path).Msg("not skipping path because config is nil")
+		return false
+	}
+
+	for _, a := range cfg.Allowlists {
+		if a.PathAllowed(path) ||
+			// TODO: Remove this in v9.
+			// This is an awkward hack to mitigate https://github.com/gitleaks/gitleaks/issues/1641.
+			(isWindows && a.PathAllowed(filepath.ToSlash(path))) {
+			return true
+		}
+	}
+
+	return false
+}
+
+// readUntilSafeBoundary consumes |f| until it finds two consecutive `\n` characters, up to |maxPeekSize|.
+// This hopefully avoids splitting. (https://github.com/gitleaks/gitleaks/issues/1651)
+func readUntilSafeBoundary(r *bufio.Reader, n int, maxPeekSize int, peekBuf *bytes.Buffer) error {
+	if peekBuf.Len() == 0 {
+		return nil
+	}
+
+	// Does the buffer end in consecutive newlines?
+	var (
+		data         = peekBuf.Bytes()
+		lastChar     = data[len(data)-1]
+		newlineCount = 0 // Tracks consecutive newlines
+	)
+
+	if isWhitespace[lastChar] {
+		for i := len(data) - 1; i >= 0; i-- {
+			lastChar = data[i]
+			if lastChar == '\n' {
+				newlineCount++
+
+				// Stop if two consecutive newlines are found
+				if newlineCount >= 2 {
+					return nil
+				}
+			} else if isWhitespace[lastChar] {
+				// The presence of other whitespace characters (`\r`, ` `, `\t`) shouldn't reset the count.
+				// (Intentionally do nothing.)
+			} else {
+				break
+			}
+		}
+	}
+
+	// If not, read ahead until we (hopefully) find some.
+	newlineCount = 0
+	for {
+		data = peekBuf.Bytes()
+		// Check if the last character is a newline.
+		lastChar = data[len(data)-1]
+		if lastChar == '\n' {
+			newlineCount++
+
+			// Stop if two consecutive newlines are found
+			if newlineCount >= 2 {
+				break
+			}
+		} else if isWhitespace[lastChar] {
+			// The presence of other whitespace characters (`\r`, ` `, `\t`) shouldn't reset the count.
+			// (Intentionally do nothing.)
+		} else {
+			newlineCount = 0 // Reset if a non-newline character is found
+		}
+
+		// Stop growing the buffer if it reaches maxSize
+		if (peekBuf.Len() - n) >= maxPeekSize {
+			break
+		}
+
+		// Read additional data into a temporary buffer
+		b, err := r.ReadByte()
+		if err != nil {
+			if err == io.EOF {
+				break
+			}
+			return err
+		}
+		peekBuf.WriteByte(b)
+	}
+	return nil
+}

+ 3 - 3
detect/directory_test.go → sources/common_test.go

@@ -1,4 +1,4 @@
-package detect
+package sources
 
 import (
 	"bufio"
@@ -65,8 +65,8 @@ func Test_readUntilSafeBoundary(t *testing.T) {
 			require.NoError(t, err)
 
 			// Assert
-			t.Logf(peekBuf.String())
-			require.Equal(t, c.expected, string(peekBuf.Bytes()))
+			t.Log(peekBuf.String())
+			require.Equal(t, c.expected, peekBuf.String())
 		})
 	}
 }

+ 0 - 105
sources/directory.go

@@ -1,105 +0,0 @@
-package sources
-
-import (
-	"io/fs"
-	"os"
-	"path/filepath"
-	"runtime"
-
-	"github.com/fatih/semgroup"
-
-	"github.com/zricethezav/gitleaks/v8/config"
-	"github.com/zricethezav/gitleaks/v8/logging"
-)
-
-type ScanTarget struct {
-	Path    string
-	Symlink string
-}
-
-var isWindows = runtime.GOOS == "windows"
-
-func DirectoryTargets(source string, s *semgroup.Group, followSymlinks bool, allowlists []*config.Allowlist) (<-chan ScanTarget, error) {
-	paths := make(chan ScanTarget)
-	s.Go(func() error {
-		defer close(paths)
-		return filepath.Walk(source,
-			func(path string, fInfo os.FileInfo, err error) error {
-				logger := logging.With().Str("path", path).Logger()
-
-				if err != nil {
-					if os.IsPermission(err) {
-						// This seems to only fail on directories at this stage.
-						logger.Warn().Msg("Skipping directory: permission denied")
-						return filepath.SkipDir
-					}
-					return err
-				}
-
-				// Empty; nothing to do here.
-				if fInfo.Size() == 0 {
-					return nil
-				}
-
-				// Unwrap symlinks, if |followSymlinks| is set.
-				scanTarget := ScanTarget{
-					Path: path,
-				}
-				if fInfo.Mode().Type() == fs.ModeSymlink {
-					if !followSymlinks {
-						logger.Debug().Msg("Skipping symlink")
-						return nil
-					}
-
-					realPath, err := filepath.EvalSymlinks(path)
-					if err != nil {
-						return err
-					}
-
-					realPathFileInfo, _ := os.Stat(realPath)
-					if realPathFileInfo.IsDir() {
-						logger.Warn().Str("target", realPath).Msg("Skipping symlinked directory")
-						return nil
-					}
-
-					scanTarget.Path = realPath
-					scanTarget.Symlink = path
-				}
-
-				// TODO: Also run this check against the resolved symlink?
-				var skip bool
-				for _, a := range allowlists {
-					skip = a.PathAllowed(path) ||
-						// TODO: Remove this in v9.
-						// This is an awkward hack to mitigate https://github.com/gitleaks/gitleaks/issues/1641.
-						(isWindows && a.PathAllowed(filepath.ToSlash(path)))
-					if skip {
-						break
-					}
-				}
-				if fInfo.IsDir() {
-					// Directory
-					if skip {
-						logger.Debug().Msg("Skipping directory due to global allowlist")
-						return filepath.SkipDir
-					}
-
-					if fInfo.Name() == ".git" {
-						// Don't scan .git directories.
-						// TODO: Add this to the config allowlist, instead of hard-coding it.
-						return filepath.SkipDir
-					}
-				} else {
-					// File
-					if skip {
-						logger.Debug().Msg("Skipping file due to global allowlist")
-						return nil
-					}
-
-					paths <- scanTarget
-				}
-				return nil
-			})
-	})
-	return paths, nil
-}

+ 248 - 0
sources/file.go

@@ -0,0 +1,248 @@
+package sources
+
+import (
+	"bufio"
+	"bytes"
+	"context"
+	"fmt"
+	"io"
+	"os"
+	"path/filepath"
+	"strings"
+
+	"github.com/h2non/filetype"
+	"github.com/mholt/archives"
+
+	"github.com/zricethezav/gitleaks/v8/config"
+	"github.com/zricethezav/gitleaks/v8/logging"
+)
+
+const defaultBufferSize = 100 * 1_000 // 100kb
+const InnerPathSeparator = "!"
+
+type seekReaderAt interface {
+	io.ReaderAt
+	io.Seeker
+}
+
+// File is a source for yielding fragments from a file or other reader
+type File struct {
+	// Content provides a reader to the file's content
+	Content io.Reader
+	// Path is the resolved real path of the file
+	Path string
+	// Symlink represents a symlink to the file if that's how it was discovered
+	Symlink string
+	// Buffer is used for reading the content in chunks
+	Buffer []byte
+	// Config is the gitleaks config used for shouldSkipPath. If not set, then
+	// shouldSkipPath is ignored
+	Config *config.Config
+	// outerPaths is the list of container paths (e.g. archives) that lead to
+	// this file
+	outerPaths []string
+	// MaxArchiveDepth limits how deep the sources will explore nested archives
+	MaxArchiveDepth int
+	// archiveDepth is the current archive nesting depth
+	archiveDepth int
+}
+
+// Fragments yields fragments for the this source
+func (s *File) Fragments(ctx context.Context, yield FragmentsFunc) error {
+	format, _, err := archives.Identify(ctx, s.Path, nil)
+	// Process the file as an archive if there's no error && Identify returns
+	// a format; but if there's an error or no format, just swallow the error
+	// and fall back on treating it like a normal file and let fileFragments
+	// decide what to do with it.
+	if err == nil && format != nil {
+		if s.archiveDepth+1 > s.MaxArchiveDepth {
+			logging.Warn().Str(
+				"path", s.FullPath(),
+			).Int(
+				"max_archive_depth", s.MaxArchiveDepth,
+			).Msg("skipping archive: exceeds max archive depth")
+			return nil
+		}
+		if extractor, ok := format.(archives.Extractor); ok {
+			return s.extractorFragments(ctx, extractor, s.Content, yield)
+		}
+		if decompressor, ok := format.(archives.Decompressor); ok {
+			return s.decompressorFragments(decompressor, s.Content, yield)
+		}
+		logging.Warn().Str("path", s.FullPath()).Msg("skipping unknown archive type")
+	}
+
+	return s.fileFragments(bufio.NewReader(s.Content), yield)
+}
+
+// extractorFragments recursively crawls archives and yields fragments
+func (s *File) extractorFragments(ctx context.Context, extractor archives.Extractor, reader io.Reader, yield FragmentsFunc) error {
+	if _, isSeekReaderAt := reader.(seekReaderAt); !isSeekReaderAt {
+		switch extractor.(type) {
+		case archives.SevenZip, archives.Zip:
+			tmpfile, err := os.CreateTemp("", "gitleaks-archive-")
+			if err != nil {
+				logging.Error().Str("path", s.FullPath()).Msg("could not create tmp file")
+				return nil
+			}
+			defer func() {
+				_ = tmpfile.Close()
+				_ = os.Remove(tmpfile.Name())
+			}()
+
+			_, err = io.Copy(tmpfile, reader)
+			if err != nil {
+				logging.Error().Str("path", s.FullPath()).Msg("could not copy archive file")
+				return nil
+			}
+
+			reader = tmpfile
+		}
+	}
+
+	return extractor.Extract(ctx, reader, func(_ context.Context, d archives.FileInfo) error {
+		if d.IsDir() {
+			return nil
+		}
+
+		innerReader, err := d.Open()
+		if err != nil {
+			logging.Error().Err(err).Str("path", s.FullPath()).Msg("could not open archive inner file")
+			return nil
+		}
+		defer innerReader.Close()
+		path := filepath.Clean(d.NameInArchive)
+
+		if s.Config != nil && shouldSkipPath(s.Config, path) {
+			logging.Debug().Str("path", s.FullPath()).Msg("skipping file: global allowlist")
+			return nil
+		}
+
+		file := &File{
+			Content:         innerReader,
+			Path:            path,
+			Symlink:         s.Symlink,
+			outerPaths:      append(s.outerPaths, filepath.ToSlash(s.Path)),
+			MaxArchiveDepth: s.MaxArchiveDepth,
+			archiveDepth:    s.archiveDepth + 1,
+		}
+
+		if err := file.Fragments(ctx, yield); err != nil {
+			return err
+		}
+
+		return nil
+	})
+}
+
+// decompressorFragments recursively crawls archives and yields fragments
+func (s *File) decompressorFragments(decompressor archives.Decompressor, reader io.Reader, yield FragmentsFunc) error {
+	innerReader, err := decompressor.OpenReader(reader)
+	if err != nil {
+		logging.Error().Str("path", s.FullPath()).Msg("could read compressed file")
+		return nil
+	}
+
+	if err := s.fileFragments(bufio.NewReader(innerReader), yield); err != nil {
+		_ = innerReader.Close()
+		return err
+	}
+
+	_ = innerReader.Close()
+	return nil
+}
+
+// fileFragments reads the file into fragments to yield
+func (s *File) fileFragments(reader *bufio.Reader, yield FragmentsFunc) error {
+	// Create a buffer if the caller hasn't provided one
+	if s.Buffer == nil {
+		s.Buffer = make([]byte, defaultBufferSize)
+	}
+
+	totalLines := 0
+	for {
+		fragment := Fragment{
+			FilePath: s.FullPath(),
+		}
+
+		n, err := reader.Read(s.Buffer)
+		if n == 0 {
+			if err != nil && err != io.EOF {
+				return yield(fragment, fmt.Errorf("could not read file: %w", err))
+			}
+
+			return nil
+		}
+
+		// Only check the filetype at the start of file.
+		if totalLines == 0 {
+			// TODO: could other optimizations be introduced here?
+			if mimetype, err := filetype.Match(s.Buffer[:n]); err != nil {
+				return yield(
+					fragment,
+					fmt.Errorf("could not read file: could not determine type: %w", err),
+				)
+			} else if mimetype.MIME.Type == "application" {
+				logging.Debug().
+					Str("mime_type", mimetype.MIME.Value).
+					Str("path", s.FullPath()).
+					Msgf("skipping binary file")
+
+				return nil
+			}
+		}
+
+		// Try to split chunks across large areas of whitespace, if possible.
+		peekBuf := bytes.NewBuffer(s.Buffer[:n])
+		if err := readUntilSafeBoundary(reader, n, maxPeekSize, peekBuf); err != nil {
+			return yield(
+				fragment,
+				fmt.Errorf("could not read file: could not read until safe boundary: %w", err),
+			)
+		}
+
+		fragment.Raw = peekBuf.String()
+		fragment.Bytes = peekBuf.Bytes()
+		fragment.StartLine = totalLines + 1
+
+		// Count the number of newlines in this chunk
+		totalLines += strings.Count(fragment.Raw, "\n")
+
+		if len(s.Symlink) > 0 {
+			fragment.SymlinkFile = s.Symlink
+		}
+
+		if isWindows {
+			fragment.FilePath = filepath.ToSlash(fragment.FilePath)
+			fragment.SymlinkFile = filepath.ToSlash(s.Symlink)
+			fragment.WindowsFilePath = s.FullPath()
+		}
+
+		// log errors but continue since there's content
+		if err != nil && err != io.EOF {
+			logging.Warn().Err(err).Msgf("issue reading file")
+		}
+
+		// Done with the file!
+		if err == io.EOF {
+			return yield(fragment, nil)
+		}
+
+		if err := yield(fragment, err); err != nil {
+			return err
+		}
+	}
+}
+
+// FullPath returns the File.Path with any preceding outer paths
+func (s *File) FullPath() string {
+	if len(s.outerPaths) > 0 {
+		return strings.Join(
+			// outerPaths have already been normalized to slash
+			append(s.outerPaths, s.Path),
+			InnerPathSeparator,
+		)
+	}
+
+	return s.Path
+}

+ 180 - 0
sources/files.go

@@ -0,0 +1,180 @@
+package sources
+
+import (
+	"context"
+	"errors"
+	"io/fs"
+	"os"
+	"path/filepath"
+	"sync"
+
+	"github.com/fatih/semgroup"
+	"github.com/zricethezav/gitleaks/v8/config"
+	"github.com/zricethezav/gitleaks/v8/logging"
+)
+
+// TODO: remove this in v9 and have scanTargets yield file sources
+type ScanTarget struct {
+	Path    string
+	Symlink string
+}
+
+// Deprecated: Use Files and detector.DetectSource instead
+func DirectoryTargets(sourcePath string, s *semgroup.Group, followSymlinks bool, allowlists []*config.Allowlist) (<-chan ScanTarget, error) {
+	paths := make(chan ScanTarget)
+
+	// create a Files source
+	files := Files{
+		FollowSymlinks: followSymlinks,
+		Path:           sourcePath,
+		Sema:           s,
+		Config: &config.Config{
+			Allowlists: allowlists,
+		},
+	}
+
+	s.Go(func() error {
+		err := files.scanTargets(func(scanTarget ScanTarget, err error) error {
+			paths <- scanTarget
+			return nil
+		})
+		close(paths)
+		return err
+	})
+
+	return paths, nil
+}
+
+// Files is a source for yielding fragments from a collection of files
+type Files struct {
+	Config          *config.Config
+	FollowSymlinks  bool
+	MaxFileSize     int
+	Path            string
+	Sema            *semgroup.Group
+	MaxArchiveDepth int
+}
+
+// scanTargets yields scan targets to a callback func
+func (s *Files) scanTargets(yield func(ScanTarget, error) error) error {
+	return filepath.WalkDir(s.Path, func(path string, d fs.DirEntry, err error) error {
+		scanTarget := ScanTarget{Path: path}
+		logger := logging.With().Str("path", path).Logger()
+
+		if err != nil {
+			if os.IsPermission(err) {
+				// This seems to only fail on directories at this stage.
+				logger.Warn().Err(errors.New("permission denied")).Msg("skipping directory")
+				return filepath.SkipDir
+			}
+			logger.Warn().Err(err).Msg("skipping")
+			return nil
+		}
+
+		info, err := d.Info()
+		if err != nil {
+			if d.IsDir() {
+				logger.Error().Err(err).Msg("skipping directory: could not get info")
+				return filepath.SkipDir
+			}
+			logger.Error().Err(err).Msg("skipping file: could not get info")
+			return nil
+		}
+
+		if !d.IsDir() {
+			// Empty; nothing to do here.
+			if info.Size() == 0 {
+				logger.Debug().Msg("skipping empty file")
+				return nil
+			}
+
+			// Too large; nothing to do here.
+			if s.MaxFileSize > 0 && info.Size() > int64(s.MaxFileSize) {
+				logger.Warn().Msgf(
+					"skipping file: too large max_size=%dMB, size=%dMB",
+					s.MaxFileSize/1_000_000, info.Size()/1_000_000,
+				)
+				return nil
+			}
+		}
+
+		// set the initial scan target values
+		if d.Type() == fs.ModeSymlink {
+			if !s.FollowSymlinks {
+				logger.Debug().Msg("skipping symlink: follow symlinks disabled")
+				return nil
+			}
+			realPath, err := filepath.EvalSymlinks(path)
+			if err != nil {
+				logger.Error().Err(err).Msg("skipping symlink: could not evaluate")
+				return nil
+			}
+			if realPathFileInfo, _ := os.Stat(realPath); realPathFileInfo.IsDir() {
+				logger.Debug().Str("target", realPath).Msgf("skipping symlink: target is directory")
+				return nil
+			}
+			scanTarget = ScanTarget{
+				Path:    realPath,
+				Symlink: path,
+			}
+		}
+
+		// handle dir cases (mainly just see if it should be skipped
+		if info.IsDir() {
+			if shouldSkipPath(s.Config, path) {
+				logger.Debug().Msg("skipping directory: global allowlist")
+				return filepath.SkipDir
+			}
+			return nil
+		}
+
+		if shouldSkipPath(s.Config, path) {
+			logger.Debug().Msg("skipping file: global allowlist")
+			return nil
+		}
+
+		return yield(scanTarget, nil)
+	})
+}
+
+// Fragments yields fragments from files discovered under the path
+func (s *Files) Fragments(ctx context.Context, yield FragmentsFunc) error {
+	var wg sync.WaitGroup
+
+	err := s.scanTargets(func(scanTarget ScanTarget, err error) error {
+		wg.Add(1)
+		s.Sema.Go(func() error {
+			logger := logging.With().Str("path", scanTarget.Path).Logger()
+			logger.Trace().Msg("scanning path")
+
+			f, err := os.Open(scanTarget.Path)
+			if err != nil {
+				if os.IsPermission(err) {
+					logger.Warn().Msg("skipping file: permission denied")
+				}
+				wg.Done()
+				return nil
+			}
+
+			// Convert this to a file source
+			file := File{
+				Content:         f,
+				Path:            scanTarget.Path,
+				Symlink:         scanTarget.Symlink,
+				Config:          s.Config,
+				MaxArchiveDepth: s.MaxArchiveDepth,
+			}
+
+			err = file.Fragments(ctx, yield)
+			// Avoiding a defer in a hot loop
+			_ = f.Close()
+			wg.Done()
+			return err
+		})
+
+		return nil
+	})
+
+	wg.Wait()
+	return err
+}

+ 26 - 0
sources/fragment.go

@@ -0,0 +1,26 @@
+package sources
+
+// Fragment represents a fragment of a source with its meta data
+type Fragment struct {
+	// Raw is the raw content of the fragment
+	Raw string
+
+	Bytes []byte
+
+	// FilePath is the path to the file if applicable.
+	// The path separator MUST be normalized to `/`.
+	FilePath    string
+	SymlinkFile string
+	// WindowsFilePath is the path with the original separator.
+	// This provides a backwards-compatible solution to https://github.com/gitleaks/gitleaks/issues/1565.
+	WindowsFilePath string `json:"-"` // TODO: remove this in v9.
+
+	// CommitSHA is the SHA of the commit if applicable
+	CommitSHA string // TODO: remove this in v9 and use CommitInfo instead
+
+	// StartLine is the line number this fragment starts on
+	StartLine int
+
+	// CommitInfo captures additional information about the git commit if applicable
+	CommitInfo *CommitInfo
+}

+ 301 - 2
sources/git.go

@@ -2,15 +2,24 @@ package sources
 
 import (
 	"bufio"
+	"bytes"
+	"context"
 	"errors"
+	"fmt"
 	"io"
+	"net/url"
 	"os/exec"
 	"path/filepath"
 	"regexp"
 	"strings"
+	"sync"
+	"time"
 
+	"github.com/fatih/semgroup"
 	"github.com/gitleaks/go-gitdiff/gitdiff"
 
+	"github.com/zricethezav/gitleaks/v8/cmd/scm"
+	"github.com/zricethezav/gitleaks/v8/config"
 	"github.com/zricethezav/gitleaks/v8/logging"
 )
 
@@ -21,6 +30,33 @@ type GitCmd struct {
 	cmd         *exec.Cmd
 	diffFilesCh <-chan *gitdiff.File
 	errCh       <-chan error
+	repoPath    string
+}
+
+// blobReader provides a ReadCloser interface git cat-file blob to fetch
+// a blob from a repo
+type blobReader struct {
+	io.ReadCloser
+	cmd *exec.Cmd
+}
+
+// Close closes the underlying reader and then waits for the command to complete,
+// releasing its resources.
+func (br *blobReader) Close() error {
+	// Discard the remaining data from the pipe to avoid blocking
+	_, drainErr := io.Copy(io.Discard, br)
+	// Close the pipe (should signal the command to stop if it hasn't already)
+	closeErr := br.ReadCloser.Close()
+	// Wait to prevent zombie processes.
+	waitErr := br.cmd.Wait()
+	// Return the first error encountered
+	if drainErr != nil {
+		return drainErr
+	}
+	if closeErr != nil {
+		return closeErr
+	}
+	return waitErr
 }
 
 // NewGitLogCmd returns `*DiffFilesCmd` with two channels: `<-chan *gitdiff.File` and `<-chan error`.
@@ -49,7 +85,7 @@ func NewGitLogCmd(source string, logOpts string) (*GitCmd, error) {
 		cmd = exec.Command("git", args...)
 	} else {
 		cmd = exec.Command("git", "-C", sourceClean, "log", "-p", "-U0",
-			"--full-history", "--all")
+			"--full-history", "--all", "--diff-filter=tuxdb")
 	}
 
 	logging.Debug().Msgf("executing: %s", cmd.String())
@@ -78,6 +114,7 @@ func NewGitLogCmd(source string, logOpts string) (*GitCmd, error) {
 		cmd:         cmd,
 		diffFilesCh: gitdiffFiles,
 		errCh:       errCh,
+		repoPath:    sourceClean,
 	}, nil
 }
 
@@ -118,6 +155,7 @@ func NewGitDiffCmd(source string, staged bool) (*GitCmd, error) {
 		cmd:         cmd,
 		diffFilesCh: gitdiffFiles,
 		errCh:       errCh,
+		repoPath:    sourceClean,
 	}, nil
 }
 
@@ -135,10 +173,31 @@ func (c *GitCmd) ErrCh() <-chan error {
 // stdin or copying from stdout or stderr to complete.
 //
 // Wait also closes underlying stdout and stderr.
-func (c *GitCmd) Wait() (err error) {
+func (c *GitCmd) Wait() error {
 	return c.cmd.Wait()
 }
 
+// NewBlobReader returns an io.ReadCloser that can be used to read a blob
+// within the git repo used to create the GitCmd.
+//
+// The caller is responsible for closing the reader.
+func (c *GitCmd) NewBlobReader(commit, path string) (io.ReadCloser, error) {
+	gitArgs := []string{"-C", c.repoPath, "cat-file", "blob", commit + ":" + path}
+	cmd := exec.Command("git", gitArgs...)
+	cmd.Stderr = io.Discard
+	stdout, err := cmd.StdoutPipe()
+	if err != nil {
+		return nil, fmt.Errorf("failed to get stdout pipe: %w", err)
+	}
+	if err := cmd.Start(); err != nil {
+		return nil, fmt.Errorf("failed to start git command: %w", err)
+	}
+	return &blobReader{
+		ReadCloser: stdout,
+		cmd:        cmd,
+	}, nil
+}
+
 // listenForStdErr listens for stderr output from git, prints it to stdout,
 // sends to errCh and closes it.
 func listenForStdErr(stderr io.ReadCloser, errCh chan<- error) {
@@ -187,3 +246,243 @@ func listenForStdErr(stderr io.ReadCloser, errCh chan<- error) {
 		return
 	}
 }
+
+// RemoteInfo provides the info needed for reconstructing links from findings
+type RemoteInfo struct {
+	Platform scm.Platform
+	Url      string
+}
+
+// Git is a source for yielding fragments from a git repo
+type Git struct {
+	Cmd             *GitCmd
+	Config          *config.Config
+	Remote          *RemoteInfo
+	Sema            *semgroup.Group
+	MaxArchiveDepth int
+}
+
+// CommitInfo captures metadata about the commit
+type CommitInfo struct {
+	AuthorEmail string
+	AuthorName  string
+	Date        string
+	Message     string
+	Remote      *RemoteInfo
+	SHA         string
+}
+
+// Fragments yields fragments from a git repo
+func (s *Git) Fragments(ctx context.Context, yield FragmentsFunc) error {
+	defer func() {
+		_ = s.Cmd.Wait()
+	}()
+
+	var (
+		diffFilesCh = s.Cmd.DiffFilesCh()
+		errCh       = s.Cmd.ErrCh()
+		wg          sync.WaitGroup
+	)
+
+	// loop to range over both DiffFiles (stdout) and ErrCh (stderr)
+	for diffFilesCh != nil || errCh != nil {
+		select {
+		case gitdiffFile, open := <-diffFilesCh:
+			if !open {
+				diffFilesCh = nil
+				break
+			}
+
+			if gitdiffFile.IsDelete {
+				continue
+			}
+
+			// skip non-archive binary files
+			yieldAsArchive := false
+			if gitdiffFile.IsBinary {
+				if !isArchive(ctx, gitdiffFile.NewName) {
+					continue
+				}
+				yieldAsArchive = true
+			}
+
+			// Check if commit is allowed
+			commitSHA := ""
+			var commitInfo *CommitInfo
+			if gitdiffFile.PatchHeader != nil {
+				commitSHA = gitdiffFile.PatchHeader.SHA
+				for _, a := range s.Config.Allowlists {
+					if ok, c := a.CommitAllowed(gitdiffFile.PatchHeader.SHA); ok {
+						logging.Trace().Str("allowed-commit", c).Msg("skipping commit: global allowlist")
+						continue
+					}
+				}
+
+				commitInfo = &CommitInfo{
+					Date:    gitdiffFile.PatchHeader.AuthorDate.UTC().Format(time.RFC3339),
+					Message: gitdiffFile.PatchHeader.Message(),
+					Remote:  s.Remote,
+					SHA:     commitSHA,
+				}
+
+				if gitdiffFile.PatchHeader.Author != nil {
+					commitInfo.AuthorName = gitdiffFile.PatchHeader.Author.Name
+					commitInfo.AuthorEmail = gitdiffFile.PatchHeader.Author.Email
+				}
+			}
+
+			wg.Add(1)
+			s.Sema.Go(func() error {
+				defer wg.Done()
+
+				if yieldAsArchive {
+					blob, err := s.Cmd.NewBlobReader(commitSHA, gitdiffFile.NewName)
+					if err != nil {
+						logging.Error().Err(err).Msg("could not read archive blob")
+						return nil
+					}
+
+					file := File{
+						Content:         blob,
+						Path:            gitdiffFile.NewName,
+						MaxArchiveDepth: s.MaxArchiveDepth,
+						Config:          s.Config,
+					}
+
+					// enrich and yield fragments
+					err = file.Fragments(ctx, func(fragment Fragment, err error) error {
+						fragment.CommitSHA = commitSHA
+						fragment.CommitInfo = commitInfo
+						return yield(fragment, err)
+					})
+
+					// Close the blob reader and log any issues
+					if err := blob.Close(); err != nil {
+						logging.Debug().Err(err).Msg("blobReader.Close() returned an error")
+					}
+
+					return err
+				}
+
+				for _, textFragment := range gitdiffFile.TextFragments {
+					if textFragment == nil {
+						return nil
+					}
+
+					fragment := Fragment{
+						CommitSHA:  commitSHA,
+						FilePath:   gitdiffFile.NewName,
+						Raw:        textFragment.Raw(gitdiff.OpAdd),
+						StartLine:  int(textFragment.NewPosition),
+						CommitInfo: commitInfo,
+					}
+
+					if err := yield(fragment, nil); err != nil {
+						return err
+					}
+				}
+
+				return nil
+			})
+		case err, open := <-errCh:
+			if !open {
+				errCh = nil
+				break
+			}
+
+			return yield(Fragment{}, err)
+		}
+	}
+
+	wg.Wait()
+	return nil
+}
+
+// NewRemoteInfo builds a new RemoteInfo for generating finding links
+func NewRemoteInfo(platform scm.Platform, source string) *RemoteInfo {
+	if platform == scm.NoPlatform {
+		return &RemoteInfo{Platform: platform}
+	}
+
+	remoteUrl, err := getRemoteUrl(source)
+	if err != nil {
+		if strings.Contains(err.Error(), "No remote configured") {
+			logging.Debug().Msg("skipping finding links: repository has no configured remote.")
+			platform = scm.NoPlatform
+		} else {
+			logging.Error().Err(err).Msg("skipping finding links: unable to parse remote URL")
+		}
+		goto End
+	}
+
+	if platform == scm.UnknownPlatform {
+		platform = platformFromHost(remoteUrl)
+		if platform == scm.UnknownPlatform {
+			logging.Info().
+				Str("host", remoteUrl.Hostname()).
+				Msg("Unknown SCM platform. Use --platform to include links in findings.")
+		} else {
+			logging.Debug().
+				Str("host", remoteUrl.Hostname()).
+				Str("platform", platform.String()).
+				Msg("SCM platform parsed from host")
+		}
+	}
+
+End:
+	var rUrl string
+	if remoteUrl != nil {
+		rUrl = remoteUrl.String()
+	}
+	return &RemoteInfo{
+		Platform: platform,
+		Url:      rUrl,
+	}
+}
+
+var sshUrlpat = regexp.MustCompile(`^git@([a-zA-Z0-9.-]+):([\w/.-]+?)(?:\.git)?$`)
+
+func getRemoteUrl(source string) (*url.URL, error) {
+	// This will return the first remote — typically, "origin".
+	cmd := exec.Command("git", "ls-remote", "--quiet", "--get-url")
+	if source != "." {
+		cmd.Dir = source
+	}
+
+	stdout, err := cmd.Output()
+	if err != nil {
+		var exitError *exec.ExitError
+		if errors.As(err, &exitError) {
+			return nil, fmt.Errorf("command failed (%d): %w, stderr: %s", exitError.ExitCode(), err, string(bytes.TrimSpace(exitError.Stderr)))
+		}
+		return nil, err
+	}
+
+	remoteUrl := string(bytes.TrimSpace(stdout))
+	if matches := sshUrlpat.FindStringSubmatch(remoteUrl); matches != nil {
+		remoteUrl = fmt.Sprintf("https://%s/%s", matches[1], matches[2])
+	}
+	remoteUrl = strings.TrimSuffix(remoteUrl, ".git")
+
+	parsedUrl, err := url.Parse(remoteUrl)
+	if err != nil {
+		return nil, fmt.Errorf("unable to parse remote URL: %w", err)
+	}
+
+	// Remove any user info.
+	parsedUrl.User = nil
+	return parsedUrl, nil
+}
+
+func platformFromHost(u *url.URL) scm.Platform {
+	switch strings.ToLower(u.Hostname()) {
+	case "github.com":
+		return scm.GitHubPlatform
+	case "gitlab.com":
+		return scm.GitLabPlatform
+	case "dev.azure.com", "visualstudio.com":
+		return scm.AzureDevOpsPlatform
+	default:
+		return scm.UnknownPlatform
+	}
+}

+ 16 - 0
sources/source.go

@@ -0,0 +1,16 @@
+package sources
+
+import (
+	"context"
+)
+
+// FragmentsFunc is the type of function called by Fragments to yield the next
+// fragment
+type FragmentsFunc func(fragment Fragment, err error) error
+
+// Source is a thing that can yield fragments
+type Source interface {
+	// Fragments provides a filepath.WalkDir like interface for scanning the
+	// fragments in the source
+	Fragments(ctx context.Context, yield FragmentsFunc) error
+}

BIN
testdata/archives/files.7z


BIN
testdata/archives/files.tar


BIN
testdata/archives/files.tar.xz


BIN
testdata/archives/files.tar.zst


BIN
testdata/archives/files.zip


+ 6 - 0
testdata/archives/files/.env.prod

@@ -0,0 +1,6 @@
+DB_HOST=example.com
+DB_PORT=443
+DB_USERNAME=postgres
+DB_PASSWORD=8ae31cacf141669ddfb5da
+DB_NAME=best_db
+DB_SSL=true

+ 1 - 0
testdata/archives/files/.gitleaksignore

@@ -0,0 +1 @@
+../testdata/repos/nogit/api.go:aws-access-key:20

+ 24 - 0
testdata/archives/files/api.go

@@ -0,0 +1,24 @@
+package main
+
+import "fmt"
+
+func main() {
+
+	var a = "initial"
+	fmt.Println(a)
+
+	var b, c int = 1, 2
+	fmt.Println(b, c)
+
+	var d = true
+	fmt.Println(d)
+
+	var e int
+	fmt.Println(e)
+
+	// opps I added a secret at line 20
+	awsToken := "AKIALALEMEL33243OLIA"
+
+	f := "apple"
+	fmt.Println(f)
+}

+ 24 - 0
testdata/archives/files/main.go

@@ -0,0 +1,24 @@
+package main
+
+import "fmt"
+
+func main() {
+
+	var a = "initial"
+	fmt.Println(a)
+
+	var b, c int = 1, 2
+	fmt.Println(b, c)
+
+	var d = true
+	fmt.Println(d)
+
+	var e int
+	fmt.Println(e)
+
+	// opps I added a secret at line 20
+	awsToken := "AKIALALEMEL33243OLIA"
+
+	f := "apple"
+	fmt.Println(f)
+}

BIN
testdata/archives/files/main.go.gz


BIN
testdata/archives/files/main.go.xz


BIN
testdata/archives/files/main.go.zst


BIN
testdata/archives/nested.tar.gz


+ 21 - 0
testdata/config/archives.toml

@@ -0,0 +1,21 @@
+title = "gitleaks config"
+# https://learnxinyminutes.com/docs/toml/ for toml reference
+
+[[rules]]
+id = "aws-access-key"
+description = "AWS Access Key"
+regex = '''(?:A3T[A-Z0-9]|AKIA|ASIA|ABIA|ACCA)[A-Z0-9]{16}'''
+tags = ["key", "AWS"]
+
+# Here to confirm that allowlists work in archives
+[[rules]]
+id = 'password'
+description = "Find the DB password in .env.prod"
+path = '''\.env\.prod$'''
+regex = '''(?i)password=([^\s]+)'''
+
+# Now ignore it to confirm allowlists work
+[[allowlists]]
+paths = [
+  '''\.env\.prod$''',
+]

+ 0 - 1
testdata/config/simple.toml

@@ -225,4 +225,3 @@ title = "gitleaks config"
     description = "PyPI upload token"
     regex = '''pypi-AgEIcHlwaS5vcmc[A-Za-z0-9-_]{50,1000}'''
     tags = ["key", "pypi"]
-

+ 0 - 0
testdata/repos/archives/.gitleaksignore


+ 10 - 0
testdata/repos/archives/README.md

@@ -0,0 +1,10 @@
+# Archives
+
+This repo has some archive files in its history!
+
+Commits:
+
+```
+07d2bd71800f1abf0421abe9bc4a83a6fdca1f68 nested.tar.gz
+db8789716fc664dbce0ed2d492570e92abf717a5 main.go.zst
+```

+ 1 - 0
testdata/repos/archives/dotGit/HEAD

@@ -0,0 +1 @@
+ref: refs/heads/main

+ 1 - 0
testdata/repos/archives/dotGit/ORIG_HEAD

@@ -0,0 +1 @@
+15fa60c13dccec6add267b7baa065977a6cc748a

+ 13 - 0
testdata/repos/archives/dotGit/config

@@ -0,0 +1,13 @@
+[core]
+	repositoryformatversion = 0
+	filemode = true
+	bare = false
+	logallrefupdates = true
+
+[remote "origin"]
+        url = git@github.com:gitleaks/test.git
+        fetch = +refs/heads/*:refs/remotes/origin/*
+
+[branch "main"]
+        remote = origin
+        merge = refs/heads/main

+ 1 - 0
testdata/repos/archives/dotGit/description

@@ -0,0 +1 @@
+Unnamed repository; edit this file 'description' to name the repository.

BIN
testdata/repos/archives/dotGit/index


+ 6 - 0
testdata/repos/archives/dotGit/info/exclude

@@ -0,0 +1,6 @@
+# git ls-files --others --exclude-from=.git/info/exclude
+# Lines that start with '#' are comments.
+# For a project mostly in C, the following would be a good set of
+# exclude patterns (uncomment them if you want to use them):
+# *.[oa]
+# *~

+ 1 - 0
testdata/repos/archives/dotGit/info/refs

@@ -0,0 +1 @@
+15fa60c13dccec6add267b7baa065977a6cc748a	refs/heads/main

BIN
testdata/repos/archives/dotGit/objects/info/commit-graph


+ 2 - 0
testdata/repos/archives/dotGit/objects/info/packs

@@ -0,0 +1,2 @@
+P pack-9d774732f0e985d717a26e126e6574d089375b0d.pack
+

BIN
testdata/repos/archives/dotGit/objects/pack/pack-9d774732f0e985d717a26e126e6574d089375b0d.idx


BIN
testdata/repos/archives/dotGit/objects/pack/pack-9d774732f0e985d717a26e126e6574d089375b0d.pack


BIN
testdata/repos/archives/dotGit/objects/pack/pack-9d774732f0e985d717a26e126e6574d089375b0d.rev


+ 2 - 0
testdata/repos/archives/dotGit/packed-refs

@@ -0,0 +1,2 @@
+# pack-refs with: peeled fully-peeled sorted 
+15fa60c13dccec6add267b7baa065977a6cc748a refs/heads/main

+ 0 - 0
testdata/repos/archives/dotGit/refs/.gitkeep


BIN
testdata/repos/archives/main.go.zst


Một số tệp đã không được hiển thị bởi vì quá nhiều tập tin thay đổi trong này khác