| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258 |
- // Copyright 2017 Frédéric Guillot. All rights reserved.
- // Use of this source code is governed by the Apache 2.0
- // license that can be found in the LICENSE file.
- package rss
- import (
- "encoding/xml"
- "path"
- "strconv"
- "strings"
- "time"
- "github.com/miniflux/miniflux/crypto"
- "github.com/miniflux/miniflux/logger"
- "github.com/miniflux/miniflux/model"
- "github.com/miniflux/miniflux/reader/date"
- "github.com/miniflux/miniflux/reader/sanitizer"
- "github.com/miniflux/miniflux/url"
- )
- type rssFeed struct {
- XMLName xml.Name `xml:"rss"`
- Version string `xml:"version,attr"`
- Title string `xml:"channel>title"`
- Links []rssLink `xml:"channel>link"`
- Language string `xml:"channel>language"`
- Description string `xml:"channel>description"`
- PubDate string `xml:"channel>pubDate"`
- ItunesAuthor string `xml:"http://www.itunes.com/dtds/podcast-1.0.dtd channel>author"`
- Items []rssItem `xml:"channel>item"`
- }
- type rssLink struct {
- XMLName xml.Name
- Data string `xml:",chardata"`
- Href string `xml:"href,attr"`
- Rel string `xml:"rel,attr"`
- }
- type rssCommentLink struct {
- XMLName xml.Name
- Data string `xml:",chardata"`
- }
- type rssAuthor struct {
- XMLName xml.Name
- Data string `xml:",chardata"`
- Name string `xml:"name"`
- Inner string `xml:",innerxml"`
- }
- type rssEnclosure struct {
- URL string `xml:"url,attr"`
- Type string `xml:"type,attr"`
- Length string `xml:"length,attr"`
- }
- type rssItem struct {
- GUID string `xml:"guid"`
- Title string `xml:"title"`
- Links []rssLink `xml:"link"`
- OriginalLink string `xml:"http://rssnamespace.org/feedburner/ext/1.0 origLink"`
- CommentLinks []rssCommentLink `xml:"comments"`
- Description string `xml:"description"`
- EncodedContent string `xml:"http://purl.org/rss/1.0/modules/content/ encoded"`
- PubDate string `xml:"pubDate"`
- Date string `xml:"http://purl.org/dc/elements/1.1/ date"`
- Authors []rssAuthor `xml:"author"`
- Creator string `xml:"http://purl.org/dc/elements/1.1/ creator"`
- EnclosureLinks []rssEnclosure `xml:"enclosure"`
- OrigEnclosureLink string `xml:"http://rssnamespace.org/feedburner/ext/1.0 origEnclosureLink"`
- }
- func (r *rssFeed) SiteURL() string {
- for _, element := range r.Links {
- if element.XMLName.Space == "" {
- return strings.TrimSpace(element.Data)
- }
- }
- return ""
- }
- func (r *rssFeed) FeedURL() string {
- for _, element := range r.Links {
- if element.XMLName.Space == "http://www.w3.org/2005/Atom" {
- return strings.TrimSpace(element.Href)
- }
- }
- return ""
- }
- func (r *rssFeed) Transform() *model.Feed {
- feed := new(model.Feed)
- feed.SiteURL = r.SiteURL()
- feed.FeedURL = r.FeedURL()
- feed.Title = strings.TrimSpace(r.Title)
- if feed.Title == "" {
- feed.Title = feed.SiteURL
- }
- for _, item := range r.Items {
- entry := item.Transform()
- if entry.Author == "" && r.ItunesAuthor != "" {
- entry.Author = r.ItunesAuthor
- }
- entry.Author = strings.TrimSpace(sanitizer.StripTags(entry.Author))
- if entry.URL == "" {
- entry.URL = feed.SiteURL
- } else {
- entryURL, err := url.AbsoluteURL(feed.SiteURL, entry.URL)
- if err == nil {
- entry.URL = entryURL
- }
- }
- if entry.Title == "" {
- entry.Title = entry.URL
- }
- feed.Entries = append(feed.Entries, entry)
- }
- return feed
- }
- func (r *rssItem) PublishedDate() time.Time {
- value := r.PubDate
- if r.Date != "" {
- value = r.Date
- }
- if value != "" {
- result, err := date.Parse(value)
- if err != nil {
- logger.Error("rss: %v", err)
- return time.Now()
- }
- return result
- }
- return time.Now()
- }
- func (r *rssItem) Author() string {
- for _, element := range r.Authors {
- if element.Name != "" {
- return element.Name
- }
- if element.Inner != "" {
- return element.Inner
- }
- }
- return r.Creator
- }
- func (r *rssItem) Hash() string {
- for _, value := range []string{r.GUID, r.URL()} {
- if value != "" {
- return crypto.Hash(value)
- }
- }
- return ""
- }
- func (r *rssItem) Content() string {
- if r.EncodedContent != "" {
- return r.EncodedContent
- }
- return r.Description
- }
- func (r *rssItem) URL() string {
- if r.OriginalLink != "" {
- return r.OriginalLink
- }
- for _, link := range r.Links {
- if link.XMLName.Space == "http://www.w3.org/2005/Atom" && link.Href != "" && isValidLinkRelation(link.Rel) {
- return strings.TrimSpace(link.Href)
- }
- if link.Data != "" {
- return strings.TrimSpace(link.Data)
- }
- }
- return ""
- }
- func (r *rssItem) Enclosures() model.EnclosureList {
- enclosures := make(model.EnclosureList, 0)
- for _, enclosure := range r.EnclosureLinks {
- length, _ := strconv.ParseInt(enclosure.Length, 10, 0)
- enclosureURL := enclosure.URL
- if r.OrigEnclosureLink != "" {
- filename := path.Base(r.OrigEnclosureLink)
- if strings.Contains(enclosureURL, filename) {
- enclosureURL = r.OrigEnclosureLink
- }
- }
- enclosures = append(enclosures, &model.Enclosure{
- URL: enclosureURL,
- MimeType: enclosure.Type,
- Size: length,
- })
- }
- return enclosures
- }
- func (r *rssItem) CommentsURL() string {
- for _, commentLink := range r.CommentLinks {
- if commentLink.XMLName.Space == "" {
- return strings.TrimSpace(commentLink.Data)
- }
- }
- return ""
- }
- func (r *rssItem) Transform() *model.Entry {
- entry := new(model.Entry)
- entry.URL = r.URL()
- entry.CommentsURL = r.CommentsURL()
- entry.Date = r.PublishedDate()
- entry.Author = r.Author()
- entry.Hash = r.Hash()
- entry.Content = r.Content()
- entry.Title = strings.TrimSpace(r.Title)
- entry.Enclosures = r.Enclosures()
- return entry
- }
- func isValidLinkRelation(rel string) bool {
- switch rel {
- case "", "alternate", "enclosure", "related", "self", "via":
- return true
- default:
- if strings.HasPrefix(rel, "http") {
- return true
- }
- return false
- }
- }
|