Parcourir la source

perf(media): minor regex simplification

The previous regex was using the [ABC..D]*[ABC] pattern, resulting in a lot of
backtracking. The new regex is stopping the matching at the first space or end
of text (and removes the trailing `.` should one be present).

The backtracking was taking around 50% of the CPU time spent in atom.Parse
jvoisin il y a 9 mois
Parent
commit
8660f5e3c7
1 fichiers modifiés avec 1 ajouts et 1 suppressions
  1. 1 1
      internal/reader/media/media.go

+ 1 - 1
internal/reader/media/media.go

@@ -9,7 +9,7 @@ import (
 	"strings"
 )
 
-var textLinkRegex = regexp.MustCompile(`(?mi)(\bhttps?:\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])`)
+var textLinkRegex = regexp.MustCompile(`(?mi)(\bhttps?://[^\s]+)[.]?(?:\s|$)`)
 
 // Specs: https://www.rssboard.org/media-rss
 type MediaItemElement struct {