Просмотр исходного кода

perf(media): minor regex simplification

The previous regex was using the [ABC..D]*[ABC] pattern, resulting in a lot of
backtracking. The new regex is stopping the matching at the first space or end
of text (and removes the trailing `.` should one be present).

The backtracking was taking around 50% of the CPU time spent in atom.Parse
jvoisin 11 месяцев назад
Родитель
Сommit
8660f5e3c7
1 измененных файлов с 1 добавлено и 1 удалено
  1. 1 1
      internal/reader/media/media.go

+ 1 - 1
internal/reader/media/media.go

@@ -9,7 +9,7 @@ import (
 	"strings"
 )
 
-var textLinkRegex = regexp.MustCompile(`(?mi)(\bhttps?:\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])`)
+var textLinkRegex = regexp.MustCompile(`(?mi)(\bhttps?://[^\s]+)[.]?(?:\s|$)`)
 
 // Specs: https://www.rssboard.org/media-rss
 type MediaItemElement struct {