Jelajahi Sumber

Decode Base64 (#1488)

* Support base64 decoding during scans

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Add failing tests for the case when the match starts outside the decoded value

* Align locations for overlapping, encoded matches

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Add --max-decode-depth flag and update README help output

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Move keywords off of fragment

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Fix comment typo

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Refactor how keywords are used in detect

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Clarify comments and rename method to match

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Improve performance

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Fix issues after rebase

Only adjust matchIndex to fix the following line issue when not working
with encoded segments since they handle adjusting the index themselves.

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Update detect/decoder.go

Don't escape characters that don't need escaping

Co-authored-by: Richard Gomez <32133502+rgmz@users.noreply.github.com>

* Update flag help text

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

* Add meta data tags from decoding

rh-pre-commit.version: 2.3.1
rh-pre-commit.check-secrets: ENABLED

---------

Co-authored-by: Richard Gomez <32133502+rgmz@users.noreply.github.com>
bplaxco 1 tahun lalu
induk
melakukan
2278a2a97e
8 mengubah file dengan 714 tambahan dan 49 penghapusan
  1. 1 1
      Makefile
  2. 30 1
      README.md
  3. 6 0
      cmd/root.go
  4. 281 0
      detect/decoder.go
  5. 91 0
      detect/decoder_test.go
  6. 85 47
      detect/detect.go
  7. 145 0
      detect/detect_test.go
  8. 75 0
      testdata/config/base64_encoded.toml

+ 1 - 1
Makefile

@@ -13,7 +13,7 @@ format:
 	go fmt ./...
 
 test: format
-	go test -v ./... --race $(PKG) 
+	go test -v ./... --race $(PKG)
 
 build: format
 	go mod tidy

+ 30 - 1
README.md

@@ -27,7 +27,7 @@
 
 ### Join our Discord! [![Discord](https://img.shields.io/discord/1102689410522284044.svg?label=&logo=discord&logoColor=ffffff&color=7389D8&labelColor=6A7EC2)](https://discord.gg/8Hzbrnkr7E)
 
-Gitleaks is a SAST tool for **detecting** and **preventing** hardcoded secrets like passwords, api keys, and tokens in git repos. Gitleaks is an **easy-to-use, all-in-one solution** for detecting secrets, past or present, in your code.
+Gitleaks is a SAST tool for **detecting** and **preventing** hardcoded secrets like passwords, API keys, and tokens in git repos. Gitleaks is an **easy-to-use, all-in-one solution** for detecting secrets, past or present, in your code.
 
 ```
 ➜  ~/code(master) gitleaks git -v
@@ -157,6 +157,7 @@ Flags:
   -h, --help                          help for gitleaks
       --ignore-gitleaks-allow         ignore gitleaks:allow comments
   -l, --log-level string              log level (trace, debug, info, warn, error, fatal) (default "info")
+      --max-decode-depth int          allow recursive decoding up to this depth (default "0", no decoding is done)
       --max-target-megabytes int      files larger than this will be skipped
       --no-banner                     suppress banner
       --no-color                      turn off color for verbose output
@@ -360,7 +361,35 @@ class CustomClass:
 
 You can ignore specific findings by creating a `.gitleaksignore` file at the root of your repo. In release v8.10.0 Gitleaks added a `Fingerprint` value to the Gitleaks report. Each leak, or finding, has a Fingerprint that uniquely identifies a secret. Add this fingerprint to the `.gitleaksignore` file to ignore that specific secret. See Gitleaks' [.gitleaksignore](https://github.com/gitleaks/gitleaks/blob/master/.gitleaksignore) for an example. Note: this feature is experimental and is subject to change in the future.
 
+#### Decoding
+
+Sometimes secrets are encoded in a way that can make them difficult to find
+with just regex. Now you can tell gitleaks to automatically find and decode
+encoded text. The flag `--max-decode-depth` enables this feature (the default
+value "0" means the feature is disabled by default).
+
+Recursive decoding is supported since decoded text can also contain encoded
+text.  The flag `--max-decode-depth` sets the recursion limit. Recursion stops
+when there are no new segments of encoded text to decode, so setting a really
+high max depth doesn't mean it will make that many passes. It will only make as
+many as it needs to decode the text. Overall, decoding only minimally increases
+scan times.
+
+The findings for encoded text differ from normal findings in the following
+ways:
+
+- The location points the bounds of the encoded text
+  - If the rule matches outside the encoded text, the bounds are adjusted to
+    include that as well
+- The match and secret contain the decoded value
+- Two tags are added `decoded:<encoding>` and `decode-depth:<depth>`
+
+Currently supported encodings:
+
+- `base64` (both standard and base64url)
+
 ## Sponsorships
+
 <p align="left">
 	<h3><a href="https://coderabbit.ai/?utm_source=oss&utm_medium=sponsorship&utm_campaign=gitleaks">coderabbit.ai</h3>
 	  <a href="https://coderabbit.ai/?utm_source=oss&utm_medium=sponsorship&utm_campaign=gitleaks">

+ 6 - 0
cmd/root.go

@@ -56,6 +56,7 @@ func init() {
 	rootCmd.PersistentFlags().Bool("no-banner", false, "suppress banner")
 	rootCmd.PersistentFlags().StringSlice("enable-rule", []string{}, "only enable specific rules by id")
 	rootCmd.PersistentFlags().StringP("gitleaks-ignore-path", "i", ".", "path to .gitleaksignore file or folder containing one")
+	rootCmd.PersistentFlags().Int("max-decode-depth", 0, "allow recursive decoding up to this depth (default \"0\", no decoding is done)")
 	err := viper.BindPFlag("config", rootCmd.PersistentFlags().Lookup("config"))
 	if err != nil {
 		log.Fatal().Msgf("err binding config %s", err.Error())
@@ -170,6 +171,11 @@ func Detector(cmd *cobra.Command, cfg config.Config, source string) *detect.Dete
 
 	// Setup common detector
 	detector := detect.NewDetector(cfg)
+
+	if detector.MaxDecodeDepth, err = cmd.Flags().GetInt("max-decode-depth"); err != nil {
+		log.Fatal().Err(err).Msg("")
+	}
+
 	// set color flag at first
 	if detector.NoColor, err = cmd.Flags().GetBool("no-color"); err != nil {
 		log.Fatal().Err(err).Msg("")

+ 281 - 0
detect/decoder.go

@@ -0,0 +1,281 @@
+package detect
+
+import (
+	"bytes"
+	"encoding/base64"
+	"fmt"
+	"regexp"
+	"unicode"
+
+	"github.com/rs/zerolog/log"
+)
+
+var b64LikelyChars [128]byte
+var b64Regexp = regexp.MustCompile(`[\w/+-]{16,}={0,3}`)
+var decoders = []func(string) ([]byte, error){
+	base64.StdEncoding.DecodeString,
+	base64.RawURLEncoding.DecodeString,
+}
+
+func init() {
+	// Basically look for anything that isn't just letters
+	for _, c := range `0123456789+/-_` {
+		b64LikelyChars[c] = 1
+	}
+}
+
+// EncodedSegment represents a portion of text that is encoded in some way.
+// `decode` supports recusive decoding and can result in "segment trees".
+// There can be multiple segments in the original text, so each can be thought
+// of as its own tree with the root being the original segment.
+type EncodedSegment struct {
+	// The parent segment in a segment tree. If nil, it is a root segment
+	parent *EncodedSegment
+
+	// Relative start/end are the bounds of the encoded value in the current pass.
+	relativeStart int
+	relativeEnd   int
+
+	// Absolute start/end refer to the bounds of the root segment in this segment
+	// tree
+	absoluteStart int
+	absoluteEnd   int
+
+	// Decoded start/end refer to the bounds of the decoded value in the current
+	// pass. These can differ from relative values because decoding can shrink
+	// or grow the size of the segment.
+	decodedStart int
+	decodedEnd   int
+
+	// This is the actual decoded content in the segment
+	decodedValue string
+
+	// This is the type of encoding
+	encoding string
+}
+
+// isChildOf inspects the bounds of two segments to determine
+// if one should be the child of another
+func (s EncodedSegment) isChildOf(parent EncodedSegment) bool {
+	return parent.decodedStart <= s.relativeStart && parent.decodedEnd >= s.relativeEnd
+}
+
+// decodedOverlaps checks if the decoded bounds of the segment overlaps a range
+func (s EncodedSegment) decodedOverlaps(start, end int) bool {
+	return start <= s.decodedEnd && end >= s.decodedStart
+}
+
+// adjustMatchIndex takes the matchIndex from the current decoding pass and
+// updates it to match the absolute matchIndex in the original text.
+func (s EncodedSegment) adjustMatchIndex(matchIndex []int) []int {
+	// The match is within the bounds of the segment so we just return
+	// the absolute start and end of the root segment.
+	if s.decodedStart <= matchIndex[0] && matchIndex[1] <= s.decodedEnd {
+		return []int{
+			s.absoluteStart,
+			s.absoluteEnd,
+		}
+	}
+
+	// Since it overlaps one side and/or the other, we're going to have to adjust
+	// and climb parents until we're either at the root or we've determined
+	// we're fully inside one of the parent segments.
+	adjustedMatchIndex := make([]int, 2)
+
+	if matchIndex[0] < s.decodedStart {
+		// It starts before the encoded segment so adjust the start to match
+		// the location before it was decoded
+		matchStartDelta := s.decodedStart - matchIndex[0]
+		adjustedMatchIndex[0] = s.relativeStart - matchStartDelta
+	} else {
+		// It starts within the encoded segment so set the bound to the
+		// relative start
+		adjustedMatchIndex[0] = s.relativeStart
+	}
+
+	if matchIndex[1] > s.decodedEnd {
+		// It ends after the encoded segment so adjust the end to match
+		// the location before it was decoded
+		matchEndDelta := matchIndex[1] - s.decodedEnd
+		adjustedMatchIndex[1] = s.relativeEnd + matchEndDelta
+	} else {
+		// It ends within the encoded segment so set the bound to the relative end
+		adjustedMatchIndex[1] = s.relativeEnd
+	}
+
+	// We're still not at a root segment so we'll need to keep on adjusting
+	if s.parent != nil {
+		return s.parent.adjustMatchIndex(adjustedMatchIndex)
+	}
+
+	return adjustedMatchIndex
+}
+
+// depth reports how many levels of decoding needed to be done (default is 1)
+func (s EncodedSegment) depth() int {
+	depth := 1
+
+	// Climb the tree and increment the depth
+	for current := &s; current.parent != nil; current = current.parent {
+		depth++
+	}
+
+	return depth
+}
+
+// tags returns additional meta data tags related to the types of segments
+func (s EncodedSegment) tags() []string {
+	return []string{
+		fmt.Sprintf("decoded:%s", s.encoding),
+		fmt.Sprintf("decode-depth:%d", s.depth()),
+	}
+}
+
+// Decoder decodes various types of data in place
+type Decoder struct {
+	decodedMap map[string]string
+}
+
+// NewDecoder creates a default decoder struct
+func NewDecoder() *Decoder {
+	return &Decoder{
+		decodedMap: make(map[string]string),
+	}
+}
+
+// decode returns the data with the values decoded in-place
+func (d *Decoder) decode(data string, parentSegments []EncodedSegment) (string, []EncodedSegment) {
+	segments := d.findEncodedSegments(data, parentSegments)
+
+	if len(segments) > 0 {
+		result := bytes.NewBuffer(make([]byte, 0, len(data)))
+
+		relativeStart := 0
+		for _, segment := range segments {
+			result.WriteString(data[relativeStart:segment.relativeStart])
+			result.WriteString(segment.decodedValue)
+			relativeStart = segment.relativeEnd
+		}
+		result.WriteString(data[relativeStart:])
+
+		return result.String(), segments
+	}
+
+	return data, segments
+}
+
+// findEncodedSegments finds the encoded segments in the data and updates the
+// segment tree for this pass
+func (d *Decoder) findEncodedSegments(data string, parentSegments []EncodedSegment) []EncodedSegment {
+	if len(data) == 0 {
+		return []EncodedSegment{}
+	}
+
+	matchIndices := b64Regexp.FindAllStringIndex(data, -1)
+	if matchIndices == nil {
+		return []EncodedSegment{}
+	}
+
+	segments := make([]EncodedSegment, 0, len(matchIndices))
+
+	// Keeps up with offsets from the text changing size as things are decoded
+	decodedShift := 0
+
+	for _, matchIndex := range matchIndices {
+		encodedValue := data[matchIndex[0]:matchIndex[1]]
+
+		if !isLikelyB64(encodedValue) {
+			d.decodedMap[encodedValue] = ""
+			continue
+		}
+
+		decodedValue, alreadyDecoded := d.decodedMap[encodedValue]
+
+		// We haven't decoded this yet, so go ahead and decode it
+		if !alreadyDecoded {
+			decodedValue = decodeValue(encodedValue)
+			d.decodedMap[encodedValue] = decodedValue
+		}
+
+		// Skip this segment because there was nothing to check
+		if len(decodedValue) == 0 {
+			continue
+		}
+
+		// Create a segment for the encoded data
+		segment := EncodedSegment{
+			relativeStart: matchIndex[0],
+			relativeEnd:   matchIndex[1],
+			absoluteStart: matchIndex[0],
+			absoluteEnd:   matchIndex[1],
+			decodedStart:  matchIndex[0] + decodedShift,
+			decodedEnd:    matchIndex[0] + decodedShift + len(decodedValue),
+			decodedValue:  decodedValue,
+			encoding:      "base64",
+		}
+
+		// Shift decoded start and ends based on size changes
+		decodedShift += len(decodedValue) - len(encodedValue)
+
+		// Adjust the absolute position of segments contained in parent segments
+		for _, parentSegment := range parentSegments {
+			if segment.isChildOf(parentSegment) {
+				segment.absoluteStart = parentSegment.absoluteStart
+				segment.absoluteEnd = parentSegment.absoluteEnd
+				segment.parent = &parentSegment
+				break
+			}
+		}
+
+		log.Debug().Msgf("segment found: %#v", segment)
+		segments = append(segments, segment)
+	}
+
+	return segments
+}
+
+// decoders tries a list of decoders and returns the first successful one
+func decodeValue(encodedValue string) string {
+	for _, decoder := range decoders {
+		decodedValue, err := decoder(encodedValue)
+
+		if err == nil && len(decodedValue) > 0 && isASCII(decodedValue) {
+			return string(decodedValue)
+		}
+	}
+
+	return ""
+}
+
+func isASCII(b []byte) bool {
+	for i := 0; i < len(b); i++ {
+		if b[i] > unicode.MaxASCII || b[i] < '\t' {
+			return false
+		}
+	}
+
+	return true
+}
+
+// Skip a lot of method signatures and things at the risk of missing about
+// 1% of base64
+func isLikelyB64(s string) bool {
+	for _, c := range s {
+		if b64LikelyChars[c] != 0 {
+			return true
+		}
+	}
+
+	return false
+}
+
+// Find a segment where the decoded bounds overlaps a range
+func segmentWithDecodedOverlap(encodedSegments []EncodedSegment, start, end int) *EncodedSegment {
+	for _, segment := range encodedSegments {
+		if segment.decodedOverlaps(start, end) {
+			return &segment
+		}
+	}
+
+	return nil
+}

+ 91 - 0
detect/decoder_test.go

@@ -0,0 +1,91 @@
+package detect
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestDecode(t *testing.T) {
+	tests := []struct {
+		chunk    string
+		expected string
+		name     string
+	}{
+		{
+			name:     "only b64 chunk",
+			chunk:    `bG9uZ2VyLWVuY29kZWQtc2VjcmV0LXRlc3Q=`,
+			expected: `longer-encoded-secret-test`,
+		},
+		{
+			name:     "mixed content",
+			chunk:    `token: bG9uZ2VyLWVuY29kZWQtc2VjcmV0LXRlc3Q=`,
+			expected: `token: longer-encoded-secret-test`,
+		},
+		{
+			name:     "no chunk",
+			chunk:    ``,
+			expected: ``,
+		},
+		{
+			name:     "env var (looks like all b64 decodable but has `=` in the middle)",
+			chunk:    `some-encoded-secret=dGVzdC1zZWNyZXQtdmFsdWU=`,
+			expected: `some-encoded-secret=test-secret-value`,
+		},
+		{
+			name:     "has longer b64 inside",
+			chunk:    `some-encoded-secret="bG9uZ2VyLWVuY29kZWQtc2VjcmV0LXRlc3Q="`,
+			expected: `some-encoded-secret="longer-encoded-secret-test"`,
+		},
+		{
+			name: "many possible i := 0substrings",
+			chunk: `Many substrings in this slack message could be base64 decoded
+				but only dGhpcyBlbmNhcHN1bGF0ZWQgc2VjcmV0 should be decoded.`,
+			expected: `Many substrings in this slack message could be base64 decoded
+				but only this encapsulated secret should be decoded.`,
+		},
+		{
+			name:     "b64-url-safe: only b64 chunk",
+			chunk:    `bG9uZ2VyLWVuY29kZWQtc2VjcmV0LXRlc3Q`,
+			expected: `longer-encoded-secret-test`,
+		},
+		{
+			name:     "b64-url-safe: mixed content",
+			chunk:    `token: bG9uZ2VyLWVuY29kZWQtc2VjcmV0LXRlc3Q`,
+			expected: `token: longer-encoded-secret-test`,
+		},
+		{
+			name:     "b64-url-safe: env var (looks like all b64 decodable but has `=` in the middle)",
+			chunk:    `some-encoded-secret=dGVzdC1zZWNyZXQtdmFsdWU=`,
+			expected: `some-encoded-secret=test-secret-value`,
+		},
+		{
+			name:     "b64-url-safe: has longer b64 inside",
+			chunk:    `some-encoded-secret="bG9uZ2VyLWVuY29kZWQtc2VjcmV0LXRlc3Q"`,
+			expected: `some-encoded-secret="longer-encoded-secret-test"`,
+		},
+		{
+			name:     "b64-url-safe: hyphen url b64",
+			chunk:    `dHJ1ZmZsZWhvZz4-ZmluZHMtc2VjcmV0cw`,
+			expected: `trufflehog>>finds-secrets`,
+		},
+		{
+			name:     "b64-url-safe: underscore url b64",
+			chunk:    `YjY0dXJsc2FmZS10ZXN0LXNlY3JldC11bmRlcnNjb3Jlcz8_`,
+			expected: `b64urlsafe-test-secret-underscores??`,
+		},
+		{
+			name:     "invalid base64 string",
+			chunk:    `a3d3fa7c2bb99e469ba55e5834ce79ee4853a8a3`,
+			expected: `a3d3fa7c2bb99e469ba55e5834ce79ee4853a8a3`,
+		},
+	}
+
+	decoder := NewDecoder()
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			decoded, _ := decoder.decode(tt.chunk, []EncodedSegment{})
+			assert.Equal(t, tt.expected, decoded)
+		})
+	}
+}

+ 85 - 47
detect/detect.go

@@ -25,6 +25,8 @@ const (
 	chunkSize              = 10 * 1_000 // 10kb
 )
 
+var newLineRegexp = regexp.MustCompile("\n")
+
 // Detector is the main detector struct
 type Detector struct {
 	// Config is the configuration for the detector
@@ -38,6 +40,9 @@ type Detector struct {
 	// verbose is a flag to print findings
 	Verbose bool
 
+	// MaxDecodeDepths limits how many recursive decoding passes are allowed
+	MaxDecodeDepth int
+
 	// files larger than this will be skipped
 	MaxTargetMegaBytes int
 
@@ -95,10 +100,6 @@ type Fragment struct {
 	// newlineIndices is a list of indices of newlines in the raw content.
 	// This is used to calculate the line location of a finding
 	newlineIndices [][]int
-
-	// keywords is a map of all the keywords contain within the contents
-	// of this fragment
-	keywords map[string]bool
 }
 
 // NewDetector creates a new detector with the given config
@@ -175,9 +176,6 @@ func (d *Detector) DetectString(content string) []report.Finding {
 func (d *Detector) Detect(fragment Fragment) []report.Finding {
 	var findings []report.Finding
 
-	// initiate fragment keywords
-	fragment.keywords = make(map[string]bool)
-
 	// check if filepath is allowed
 	if fragment.FilePath != "" && (d.Config.Allowlist.PathAllowed(fragment.FilePath) ||
 		fragment.FilePath == d.Config.Path || (d.baselinePath != "" && fragment.FilePath == d.baselinePath)) {
@@ -185,38 +183,62 @@ func (d *Detector) Detect(fragment Fragment) []report.Finding {
 	}
 
 	// add newline indices for location calculation in detectRule
-	fragment.newlineIndices = regexp.MustCompile("\n").FindAllStringIndex(fragment.Raw, -1)
+	fragment.newlineIndices = newLineRegexp.FindAllStringIndex(fragment.Raw, -1)
+
+	// setup variables to handle different decoding passes
+	currentRaw := fragment.Raw
+	encodedSegments := []EncodedSegment{}
+	currentDecodeDepth := 0
+	decoder := NewDecoder()
+
+	for {
+		// build keyword map for prefiltering rules
+		keywords := make(map[string]bool)
+		normalizedRaw := strings.ToLower(currentRaw)
+		matches := d.prefilter.MatchString(normalizedRaw)
+		for _, m := range matches {
+			keywords[normalizedRaw[m.Pos():int(m.Pos())+len(m.Match())]] = true
+		}
 
-	// build keyword map for prefiltering rules
-	normalizedRaw := strings.ToLower(fragment.Raw)
-	matches := d.prefilter.MatchString(normalizedRaw)
-	for _, m := range matches {
-		fragment.keywords[normalizedRaw[m.Pos():int(m.Pos())+len(m.Match())]] = true
-	}
+		for _, rule := range d.Config.Rules {
+			if len(rule.Keywords) == 0 {
+				// if no keywords are associated with the rule always scan the
+				// fragment using the rule
+				findings = append(findings, d.detectRule(fragment, currentRaw, rule, encodedSegments)...)
+				continue
+			}
 
-	for _, rule := range d.Config.Rules {
-		if len(rule.Keywords) == 0 {
-			// if not keywords are associated with the rule always scan the
-			// fragment using the rule
-			findings = append(findings, d.detectRule(fragment, rule)...)
-			continue
-		}
-		fragmentContainsKeyword := false
-		// check if keywords are in the fragment
-		for _, k := range rule.Keywords {
-			if _, ok := fragment.keywords[strings.ToLower(k)]; ok {
-				fragmentContainsKeyword = true
+			// check if keywords are in the fragment
+			for _, k := range rule.Keywords {
+				if _, ok := keywords[strings.ToLower(k)]; ok {
+					findings = append(findings, d.detectRule(fragment, currentRaw, rule, encodedSegments)...)
+					break
+				}
 			}
 		}
-		if fragmentContainsKeyword {
-			findings = append(findings, d.detectRule(fragment, rule)...)
+
+		// increment the depth by 1 as we start our decoding pass
+		currentDecodeDepth++
+
+		// stop the loop if we've hit our max decoding depth
+		if currentDecodeDepth > d.MaxDecodeDepth {
+			break
+		}
+
+		// decode the currentRaw for the next pass
+		currentRaw, encodedSegments = decoder.decode(currentRaw, encodedSegments)
+
+		// stop the loop when there's nothing else to decode
+		if len(encodedSegments) == 0 {
+			break
 		}
 	}
+
 	return filter(findings, d.Redact)
 }
 
 // detectRule scans the given fragment for the given rule and returns a list of findings
-func (d *Detector) detectRule(fragment Fragment, rule config.Rule) []report.Finding {
+func (d *Detector) detectRule(fragment Fragment, currentRaw string, rule config.Rule, encodedSegments []EncodedSegment) []report.Finding {
 	var findings []report.Finding
 
 	// check if filepath or commit is allowed for this rule
@@ -225,7 +247,7 @@ func (d *Detector) detectRule(fragment Fragment, rule config.Rule) []report.Find
 		return findings
 	}
 
-	if rule.Path != nil && rule.Regex == nil {
+	if rule.Path != nil && rule.Regex == nil && len(encodedSegments) == 0 {
 		// Path _only_ rule
 		if rule.Path.MatchString(fragment.FilePath) {
 			finding := report.Finding{
@@ -252,23 +274,39 @@ func (d *Detector) detectRule(fragment Fragment, rule config.Rule) []report.Find
 		return findings
 	}
 
-	// If flag configure and raw data size bigger then the flag
+	// if flag configure and raw data size bigger then the flag
 	if d.MaxTargetMegaBytes > 0 {
-		rawLength := len(fragment.Raw) / 1000000
+		rawLength := len(currentRaw) / 1000000
 		if rawLength > d.MaxTargetMegaBytes {
 			log.Debug().Msgf("skipping file: %s scan due to size: %d", fragment.FilePath, rawLength)
 			return findings
 		}
 	}
 
-	matchIndices := rule.Regex.FindAllStringIndex(fragment.Raw, -1)
-	for _, matchIndex := range matchIndices {
-		// extract secret from match
-		secret := strings.Trim(fragment.Raw[matchIndex[0]:matchIndex[1]], "\n")
-
-		// Fixes: https://github.com/gitleaks/gitleaks/issues/1352
-		// removes the incorrectly following line that was detected by regex expression '\n'
-		matchIndex[1] = matchIndex[0] + len(secret)
+	// use currentRaw instead of fragment.Raw since this represents the current
+	// decoding pass on the text
+	for _, matchIndex := range rule.Regex.FindAllStringIndex(currentRaw, -1) {
+		// Extract secret from match
+		secret := strings.Trim(currentRaw[matchIndex[0]:matchIndex[1]], "\n")
+
+		// For any meta data from decoding
+		var metaTags []string
+
+		// Check if the decoded portions of the segment overlap with the match
+		// to see if its potentially a new match
+		if len(encodedSegments) > 0 {
+			if segment := segmentWithDecodedOverlap(encodedSegments, matchIndex[0], matchIndex[1]); segment != nil {
+				matchIndex = segment.adjustMatchIndex(matchIndex)
+				metaTags = append(metaTags, segment.tags()...)
+			} else {
+				// This item has already been added to a finding
+				continue
+			}
+		} else {
+			// Fixes: https://github.com/gitleaks/gitleaks/issues/1352
+			// removes the incorrectly following line that was detected by regex expression '\n'
+			matchIndex[1] = matchIndex[0] + len(secret)
+		}
 
 		// determine location of match. Note that the location
 		// in the finding will be the line/column numbers of the _match_
@@ -291,7 +329,7 @@ func (d *Detector) detectRule(fragment Fragment, rule config.Rule) []report.Find
 			EndColumn:   loc.endColumn,
 			Secret:      secret,
 			Match:       secret,
-			Tags:        rule.Tags,
+			Tags:        append(rule.Tags, metaTags...),
 			Line:        fragment.Raw[loc.startLineIndex:loc.endLineIndex],
 		}
 
@@ -327,6 +365,12 @@ func (d *Detector) detectRule(fragment Fragment, rule config.Rule) []report.Find
 			}
 		}
 
+		// check if the secret is in the list of stopwords
+		if rule.Allowlist.ContainsStopWord(finding.Secret) ||
+			d.Config.Allowlist.ContainsStopWord(finding.Secret) {
+			continue
+		}
+
 		// check if the regexTarget is defined in the allowlist "regexes" entry
 		allowlistTarget := finding.Secret
 		switch rule.Allowlist.RegexTarget {
@@ -348,12 +392,6 @@ func (d *Detector) detectRule(fragment Fragment, rule config.Rule) []report.Find
 			continue
 		}
 
-		// check if the secret is in the list of stopwords
-		if rule.Allowlist.ContainsStopWord(finding.Secret) ||
-			d.Config.Allowlist.ContainsStopWord(finding.Secret) {
-			continue
-		}
-
 		// check entropy
 		entropy := shannonEntropy(finding.Secret)
 		finding.Entropy = float32(entropy)

+ 145 - 0
detect/detect_test.go

@@ -15,8 +15,31 @@ import (
 	"github.com/zricethezav/gitleaks/v8/sources"
 )
 
+const maxDecodeDepth = 8
 const configPath = "../testdata/config/"
 const repoBasePath = "../testdata/repos/"
+const b64TestValues = `
+# Decoded
+-----BEGIN PRIVATE KEY-----
+135f/bRUBHrbHqLY/xS3I7Oth+8rgG+0tBwfMcbk05Sgxq6QUzSYIQAop+WvsTwk2sR+C38g0Mnb
+u+QDkg0spw==
+-----END PRIVATE KEY-----
+
+# Encoded
+private_key: 'LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCjQzNWYvYlJVQkhyYkhxTFkveFMzSTdPdGgrOHJnRyswdEJ3Zk1jYmswNVNneHE2UVV6U1lJUUFvcCtXdnNUd2syc1IrQzM4ZzBNbmIKdStRRGtnMHNwdz09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K'
+
+# Double Encoded: b64 encoded aws config inside a jwt
+eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwiY29uZmlnIjoiVzJSbFptRjFiSFJkQ25KbFoybHZiaUE5SUhWekxXVmhjM1F0TWdwaGQzTmZZV05qWlhOelgydGxlVjlwWkNBOUlFRlRTVUZKVDFOR1QwUk9UamRNV0UweE1FcEpDbUYzYzE5elpXTnlaWFJmWVdOalpYTnpYMnRsZVNBOUlIZEtZV3h5V0ZWMGJrWkZUVWt2U3pkTlJFVk9SeTlpVUhoU1ptbERXVVZHVlVORWJFVllNVUVLIiwiaWF0IjoxNTE2MjM5MDIyfQ.8gxviXEOuIBQk2LvTYHSf-wXVhnEKC3h4yM5nlOF4zA
+
+# A small secret at the end to make sure that as the other ones above shrink
+# when decoded, the positions are taken into consideratoin for overlaps
+c21hbGwtc2VjcmV0
+
+# This tests how it handles when the match bounds go outside the decoded value
+secret=ZGVjb2RlZC1zZWNyZXQtdmFsdWU=
+# The above encoded again
+c2VjcmV0PVpHVmpiMlJsWkMxelpXTnlaWFF0ZG1Gc2RXVT0=
+`
 
 func TestDetect(t *testing.T) {
 	tests := []struct {
@@ -330,6 +353,127 @@ func TestDetect(t *testing.T) {
 			},
 			expectedFindings: []report.Finding{},
 		},
+		{
+			cfgName: "base64_encoded",
+			fragment: Fragment{
+				Raw:      b64TestValues,
+				FilePath: "tmp.go",
+			},
+			expectedFindings: []report.Finding{
+				{ // Plain text key captured by normal rule
+					Description: "Private Key",
+					Secret:      "-----BEGIN PRIVATE KEY-----\n135f/bRUBHrbHqLY/xS3I7Oth+8rgG+0tBwfMcbk05Sgxq6QUzSYIQAop+WvsTwk2sR+C38g0Mnb\nu+QDkg0spw==\n-----END PRIVATE KEY-----",
+					Match:       "-----BEGIN PRIVATE KEY-----\n135f/bRUBHrbHqLY/xS3I7Oth+8rgG+0tBwfMcbk05Sgxq6QUzSYIQAop+WvsTwk2sR+C38g0Mnb\nu+QDkg0spw==\n-----END PRIVATE KEY-----",
+					File:        "tmp.go",
+					Line:        "\n-----BEGIN PRIVATE KEY-----\n135f/bRUBHrbHqLY/xS3I7Oth+8rgG+0tBwfMcbk05Sgxq6QUzSYIQAop+WvsTwk2sR+C38g0Mnb\nu+QDkg0spw==\n-----END PRIVATE KEY-----",
+					RuleID:      "private-key",
+					Tags:        []string{"key", "private"},
+					StartLine:   2,
+					EndLine:     5,
+					StartColumn: 2,
+					EndColumn:   26,
+					Entropy:     5.350665,
+				},
+				{ // Encoded key captured by custom b64 regex rule
+					Description: "Private Key",
+					Secret:      "LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCjQzNWYvYlJVQkhyYkhxTFkveFMzSTdPdGgrOHJnRyswdEJ3Zk1jYmswNVNneHE2UVV6U1lJUUFvcCtXdnNUd2syc1IrQzM4ZzBNbmIKdStRRGtnMHNwdz09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K",
+					Match:       "LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCjQzNWYvYlJVQkhyYkhxTFkveFMzSTdPdGgrOHJnRyswdEJ3Zk1jYmswNVNneHE2UVV6U1lJUUFvcCtXdnNUd2syc1IrQzM4ZzBNbmIKdStRRGtnMHNwdz09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K",
+					File:        "tmp.go",
+					Line:        "\nprivate_key: 'LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCjQzNWYvYlJVQkhyYkhxTFkveFMzSTdPdGgrOHJnRyswdEJ3Zk1jYmswNVNneHE2UVV6U1lJUUFvcCtXdnNUd2syc1IrQzM4ZzBNbmIKdStRRGtnMHNwdz09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K'",
+					RuleID:      "b64-encoded-private-key",
+					Tags:        []string{"key", "private"},
+					StartLine:   8,
+					EndLine:     8,
+					StartColumn: 16,
+					EndColumn:   207,
+					Entropy:     5.3861146,
+				},
+				{ // Encoded key captured by plain text rule using the decoder
+					Description: "Private Key",
+					Secret:      "-----BEGIN PRIVATE KEY-----\n435f/bRUBHrbHqLY/xS3I7Oth+8rgG+0tBwfMcbk05Sgxq6QUzSYIQAop+WvsTwk2sR+C38g0Mnb\nu+QDkg0spw==\n-----END PRIVATE KEY-----",
+					Match:       "-----BEGIN PRIVATE KEY-----\n435f/bRUBHrbHqLY/xS3I7Oth+8rgG+0tBwfMcbk05Sgxq6QUzSYIQAop+WvsTwk2sR+C38g0Mnb\nu+QDkg0spw==\n-----END PRIVATE KEY-----",
+					File:        "tmp.go",
+					Line:        "\nprivate_key: 'LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCjQzNWYvYlJVQkhyYkhxTFkveFMzSTdPdGgrOHJnRyswdEJ3Zk1jYmswNVNneHE2UVV6U1lJUUFvcCtXdnNUd2syc1IrQzM4ZzBNbmIKdStRRGtnMHNwdz09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K'",
+					RuleID:      "private-key",
+					Tags:        []string{"key", "private", "decoded:base64", "decode-depth:1"},
+					StartLine:   8,
+					EndLine:     8,
+					StartColumn: 16,
+					EndColumn:   207,
+					Entropy:     5.350665,
+				},
+				{ // Encoded AWS config with a access key id inside a JWT
+					Description: "AWS IAM Unique Identifier",
+					Secret:      "ASIAIOSFODNN7LXM10JI",
+					Match:       " ASIAIOSFODNN7LXM10JI",
+					File:        "tmp.go",
+					Line:        "\neyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwiY29uZmlnIjoiVzJSbFptRjFiSFJkQ25KbFoybHZiaUE5SUhWekxXVmhjM1F0TWdwaGQzTmZZV05qWlhOelgydGxlVjlwWkNBOUlFRlRTVUZKVDFOR1QwUk9UamRNV0UweE1FcEpDbUYzYzE5elpXTnlaWFJmWVdOalpYTnpYMnRsZVNBOUlIZEtZV3h5V0ZWMGJrWkZUVWt2U3pkTlJFVk9SeTlpVUhoU1ptbERXVVZHVlVORWJFVllNVUVLIiwiaWF0IjoxNTE2MjM5MDIyfQ.8gxviXEOuIBQk2LvTYHSf-wXVhnEKC3h4yM5nlOF4zA",
+					RuleID:      "aws-iam-unique-identifier",
+					Tags:        []string{"aws", "identifier", "decoded:base64", "decode-depth:2"},
+					StartLine:   11,
+					EndLine:     11,
+					StartColumn: 39,
+					EndColumn:   344,
+					Entropy:     3.6841838,
+				},
+				{ // Encoded AWS config with a secret access key inside a JWT
+					Description: "AWS Secret Access Key",
+					Secret:      "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEFUCDlEX1A",
+					Match:       "aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEFUCDlEX1A",
+					File:        "tmp.go",
+					Line:        "\neyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwiY29uZmlnIjoiVzJSbFptRjFiSFJkQ25KbFoybHZiaUE5SUhWekxXVmhjM1F0TWdwaGQzTmZZV05qWlhOelgydGxlVjlwWkNBOUlFRlRTVUZKVDFOR1QwUk9UamRNV0UweE1FcEpDbUYzYzE5elpXTnlaWFJmWVdOalpYTnpYMnRsZVNBOUlIZEtZV3h5V0ZWMGJrWkZUVWt2U3pkTlJFVk9SeTlpVUhoU1ptbERXVVZHVlVORWJFVllNVUVLIiwiaWF0IjoxNTE2MjM5MDIyfQ.8gxviXEOuIBQk2LvTYHSf-wXVhnEKC3h4yM5nlOF4zA",
+					RuleID:      "aws-secret-access-key",
+					Tags:        []string{"aws", "secret", "decoded:base64", "decode-depth:2"},
+					StartLine:   11,
+					EndLine:     11,
+					StartColumn: 39,
+					EndColumn:   344,
+					Entropy:     4.721928,
+				},
+				{ // Encoded Small secret at the end to make sure it's picked up by the decoding
+					Description: "Small Secret",
+					Secret:      "small-secret",
+					Match:       "small-secret",
+					File:        "tmp.go",
+					Line:        "\nc21hbGwtc2VjcmV0",
+					RuleID:      "small-secret",
+					Tags:        []string{"small", "secret", "decoded:base64", "decode-depth:1"},
+					StartLine:   15,
+					EndLine:     15,
+					StartColumn: 2,
+					EndColumn:   17,
+					Entropy:     3.0849626,
+				},
+				{ // Secret where the decoded match goes outside the encoded value
+					Description: "Overlapping",
+					Secret:      "decoded-secret-value",
+					Match:       "secret=decoded-secret-value",
+					File:        "tmp.go",
+					Line:        "\nsecret=ZGVjb2RlZC1zZWNyZXQtdmFsdWU=",
+					RuleID:      "overlapping",
+					Tags:        []string{"overlapping", "decoded:base64", "decode-depth:1"},
+					StartLine:   18,
+					EndLine:     18,
+					StartColumn: 2,
+					EndColumn:   36,
+					Entropy:     3.3037016,
+				},
+				{ // Secret where the decoded match goes outside the encoded value and then encoded again
+					Description: "Overlapping",
+					Secret:      "decoded-secret-value",
+					Match:       "secret=decoded-secret-value",
+					File:        "tmp.go",
+					Line:        "\nc2VjcmV0PVpHVmpiMlJsWkMxelpXTnlaWFF0ZG1Gc2RXVT0=",
+					RuleID:      "overlapping",
+					Tags:        []string{"overlapping", "decoded:base64", "decode-depth:2"},
+					StartLine:   20,
+					EndLine:     20,
+					StartColumn: 2,
+					EndColumn:   49,
+					Entropy:     3.3037016,
+				},
+			},
+		},
 	}
 
 	for _, tt := range tests {
@@ -347,6 +491,7 @@ func TestDetect(t *testing.T) {
 		cfg.Path = filepath.Join(configPath, tt.cfgName+".toml")
 		assert.Equal(t, tt.wantError, err)
 		d := NewDetector(cfg)
+		d.MaxDecodeDepth = maxDecodeDepth
 		d.baselinePath = tt.baselinePath
 
 		findings := d.Detect(tt.fragment)

+ 75 - 0
testdata/config/base64_encoded.toml

@@ -0,0 +1,75 @@
+# We want to be able to find this key regardless if it's b64 encoded or not
+[[rules]]
+  id = 'private-key'
+  description = 'Private Key'
+  regex = '''(?i)-----BEGIN[ A-Z0-9_-]{0,100}PRIVATE KEY(?: BLOCK)?-----[\s\S-]*?-----END[ A-Z0-9_-]{0,100}PRIVATE KEY(?: BLOCK)?-----'''
+  tags = ['key', 'private']
+  keywords = [
+      '-----begin',
+  ]
+
+# This exists to test what would happen if a normal rule matched something that
+# also gets decoded. We don't want to break anyone's existing rules that might
+# be looking for specific segments of b64 encoded data.
+[[rules]]
+  id = 'b64-encoded-private-key'
+  description = 'Private Key'
+  regex = '''(?:LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0t|0tLS0tQkVHSU4gUFJJVkFURSBLRVktLS0tL|tLS0tLUJFR0lOIFBSSVZBVEUgS0VZLS0tLS)[a-zA-Z0-9+\/]+={0,3}'''
+  tags = ['key', 'private']
+  keywords = [
+    'ls0tls1crudjtibquklwqvrfietfws0tls0t',
+    '0tls0tqkvhsu4gufjjvkfursblrvktls0tl',
+    'tls0tlujfr0loifbssvzbveugs0vzls0tls',
+  ]
+
+
+[[rules]]
+  id = 'aws-iam-unique-identifier'
+  description = 'AWS IAM Unique Identifier'
+  # The funky not group at the beginning consists of ascii ranges
+  regex = '''(?:^|[^!$-&\(-9<>-~])((?:A3T[A-Z0-9]|ACCA|ABIA|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16})\b'''
+  tags = ['aws', 'identifier']
+  entropy = 3.2
+  secretGroup = 1
+  keywords = [
+    'a3t',
+    'abia',
+    'acca',
+    'agpa',
+    'aida',
+    'aipa',
+    'akia',
+    'anpa',
+    'anva',
+    'aroa',
+    'asia',
+  ]
+
+[[rules]]
+  id = 'aws-secret-access-key'
+  description = 'AWS Secret Access Key'
+  regex = '''(?i)aws[\w\-]{0,32}[\'\"]?\s*?[:=\(]\s*?[\'\"]?([a-z0-9\/+]{40})\b'''
+  tags = ['aws', 'secret']
+  entropy = 4
+  secretGroup = 1
+  keywords = [
+    'aws',
+  ]
+
+[[rules]]
+  # Use a small one for making sure things shifting around are kept up with
+  # appropriately
+  id = 'small-secret'
+  description = 'Small Secret'
+  regex = '''\bsmall-secret\b'''
+  tags = ['small', 'secret']
+
+[[rules]]
+  # When the example value is decoded this will overlap and this is here to
+  # test that the location information is reported accurately when the match
+  # goes outside the bounds of the encoded value
+  id = 'overlapping'
+  description = 'Overlapping'
+  regex = '''secret=(decoded-secret-value)'''
+  tags = ['overlapping']
+  secretGroup = 1