Remove the HTML tag associated with the class

I force myself to learn a script exclusively in AppleScript, but currently I'm having a problem trying to remove a specific tag with a class. I tried to find solid documentation and examples, but at the moment this seems very limited.

Here is the HTML that I have:

<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>

I am trying to delete a specific class, so it will delete <span class="foo">, result:

<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>

I know how to do this with do shell scriptand through the terminal, but I want to find out what is available through the AppleScript dictionary.

In the study, I was able to find a way to parse all HTML tags using:

on removeMarkupFromText(theText)
    set tagDetected to false
    set theCleanText to ""
    repeat with a from 1 to length of theText
        set theCurrentCharacter to character a of theText
        if theCurrentCharacter is "<" then
            set tagDetected to true
        else if theCurrentCharacter is ">" then
            set tagDetected to false
        else if tagDetected is false then
            set theCleanText to theCleanText & theCurrentCharacter as string
        end if
    end repeat
    return theCleanText
end removeMarkupFromText

HTML, , . SO , HTML AppleScript, .

BBEdit Balance Tags, Balance , :

tell application "BBEdit"
    activate
    find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
    balance tags
end tell

, .

tag find tag, : set spanTarget to (find tag "span" start_offset counter) |class| of attributes of tag of spanTarget Balance Tags, .

, AppleScript, , , ?

+4
3

, - , , . , , , , , , .

on run
    set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class=\"foo\">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class=\"bar\">Pig brisket</span> jowl ham pastrami <span class=\"foo\">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>" 
    set theHTML to removeTag(theHTML, "<span class=\"foo\">", "</span>")
end run

on removeTag(theText, startTag, endTag)
    if theText contains startTag then
        set AppleScript text item delimiters to {""}
        set AppleScript text item delimiters to startTag
        set tempText to text items of (theText as string)
        set AppleScript text item delimiters to {""}

        set middleText to item 2 of tempText as string
        if middleText contains endTag then
            set AppleScript text item delimiters to endTag
            set tempText2 to text items of (middleText as string)
            set AppleScript text item delimiters to {""}
            set newString to implode(tempText2, endTag)
            set item 2 of tempText to newString
        end if
        set newString to implode(tempText, startTag)
        removeTag(newString, startTag, endTag) -- recursive
    else
        return theText
    end if
end removeTag

on implode(parts, tag)
    set newString to items 1 thru 2 of parts as string
    if (count of parts) > 2 then
        set newList to {newString, items 3 thru -1 of parts}
        set AppleScript text item delimiters to tag
        set newString to (newList as string)
        set AppleScript text item delimiters to {""}
    end if
    return newString
end implode
0

find BBEdit TextWrangler:

(), :

find "<span class=\"foo\">.+?</span>" searching in text 1 of text document 1 options {search mode:grep, wrap around:true} with selecting match

.+?</span>:

  • . ( )
  • +
  • ?
  • , span, , , span, - , BBEdit </span> .

, (?s) , :

find "(?s)<span class=\"foo\">.+?</span>" searching in text 1 of text document 1 options {search mode:grep, wrap around:true} with selecting match

  • :

<span class="foo">shoulder</span>

  1. :

<span class="foo">shoulder </span>

  1. :

<span class="foo">shoulder xxxx yyyy zzzz</span>


AppleScript replace (BBEdit TextWrangler)), ,

replace "(?s)<span class=\"foo\">.+?</span>" using "" searching in text 1 of text document 1 options {search mode:grep, wrap around:true}
+1

This regular expression job is accessible using the currently supported AppleScriptObjC bridge. Paste this code into the Script editor and run it:

use AppleScript version "2.5" -- for El Capitan or later
use framework "Foundation"
use scripting additions

on stringByMatching:thePattern inString:theString replacingWith:theTemplate
    set theNSString to current application NSString stringWithString:theString
    set theOptions to (current application NSRegularExpressionDotMatchesLineSeparators as integer) + (current application NSRegularExpressionAnchorsMatchLines as integer)
    set theExpression to current application NSRegularExpression regularExpressionWithPattern:thePattern options:theOptions |error|:(missing value)
    set theResult to theExpression stringByReplacingMatchesInString:theNSString options:theOptions range:{location:0, |length|:theNSString |length|()} withTemplate:theTemplate
    return theResult as text
end stringByMatching:inString:replacingWith:

set theHTML to "<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class='foo'>SHOULDER</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class='bar'>PIG BRISKET</span> jowl ham pastrami <span class='foo'>JERKY</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>"

set modifiedHTML to its stringByMatching:"<span .*?>(.*?)</span>" inString:theHTML replacingWith:"$1"

This works with well-formatted HTML, but as user foo pointed out above, the browser may work with badly formatted HTML, but you probably cannot.

0
source

All Articles