Save source open webpage from Safari using AppleScript

How can I write a script that saves a webpage in Safari in some way?

(The code will be used for a more complex script later, so the kludgy solution using System Events will not be executed.) Many search queries to find a script that uses the save source function left me pretty uninformed, so the answer to this may be the first in the internet. I have inserted some things that may be useful below.

Potentially Useful Material

These two entries from the AppleScript dictionary for Safari look useful:

document n [see also Standard Suite]: Safari document representing the active tab in the window.

properties:

  • source (text, r / o): The HTML source of the webpage currently loading in the document.
  • text (text, r / o): The text of the webpage currently loading in the document. Changes to the text are not reflected on the web page.
  • URL (text): The current URL of the document.

and later:

save . v: save the object.

save specifier: object for command

  • [ as text]: file type for saving data.
  • [ in ]: file to save the object.

A script that almost does what I want

This script saves the HTML document, but the result looks broken compared to the files saved with the Safari function "Export as Page" manually:

tell application "Safari" (* Get a reference to the document *) set myDoc to document of front window (* Get the source of the page *) set mySrc to source of myDoc (* Get a file name *) set myName to "Message_" & "0001" & ".html" -- the # will be modified later tell application "Finder" (* Get a path to the front window *) set myPath to (target of front window) as string (* Get a file path *) set filePath to myPath & myName (* Create a brand new file *) set openRef to open for access (myPath & myName) with write permission (* Save the document source *) write mySrc to openRef (* Close the file *) close access openRef end tell 

This is what I wrote so far:

The scripts I wrote so far

  • This is my first attempt:

     tell application "Safari" set pageToSaveSafariWindowIn to "Q:Ø:" set pageToBeSaved to front window save document pageToBeSaved as source in alias pageToSaveSafariWindowIn end tell 

    Here are the resulting logs:

     tell application "Safari" get window 1 --> window id 6017 save document (window id 6017) as source in alias "Q:Ø:" --> error number -1700 from window id 6017 to integer 

    and

    Error "Safari received the error message: Cant make window id 6017 to integer type." number -1700 from the window id 6017 into an integer

  • And one more attempt:

     tell application "Safari" save source of document in "Q:Ø:" end tell 

    which gives the result of log:

    error "I can not get the source of the document." number -1728 from the "class conT" document

+4
source share
5 answers

This is a way to save a window full of tabs. The original user interface was written by StefanK aka. Stefan Klime from the famous Mc Scripter. It examines the end of web archive files when Safari doubts you can configure whether you want to overwrite or ignore already written files. It does not save duplicate tabs, and you can set the property to decide whether it will close the tab when it is saved.

Take a look at MacScripter, the direct link is in the script for any updates.

You can use wget, but I decided to use UI Scripting, since wget has boot material that is already in your browser, and also kluge for programming.

     property tlvl: me
     # Release 1.0.1
     # Β© 2012 McUsr and put in Public Domain under GPL 1.0
     # Please refer to this post: http://macscripter.net/post.php?tid=30892
     property shallClose: false # set this to false if you don't want to close the windows, just saving them
     property dontOverWriteSavedTabs: false # set this to true if you don't want to overwrite already saved tabs in the folder 
     script saveTabsInSafariWindowsToFolder
         property parent: AppleScript

         property scripttitle: "SafariSaveTabs"
         on run
             if downloadWindowInFront () then return 0 # activates Safari

             local script_cache
             set script_cache to my storage scriptCache ()

             set saveFolder to POSIX path of (getHFSFolder ({theMessage: "Choose or create folder to save Safari-tabs in.", hfsPath: DefaultLocation of script_cache as alias}))
             if saveFolder = false then return 0 - we were obviously mistaken, about what we wanted to do.

             my storage saveParenFolderInScriptCache (saveFolder, script_cache)

             tell application "Safari"
                 tell its window 1
                     local tabc, oldidx
                     set tabc to count tabs of it
                     if not tlvl shallClose then
                         set oldidx to index of current tab
                         tell tab tabc to do JavaScript "self.focus ()"
                     end if
                     local saveCounter
                     set saveCounter to 1 - regulates setting of save folder to only first time in Safari.
                     repeat while tabc> 0
                         local theUrl, theIdx, theProtocol, alreadyClosed

                         set {theUrl, theIdx, alreadyClosed} to {URL of its current tab, index of its current tab, false}

                         if my isntAduplicateTab (theIdx, it) then

                             set theProtocol to my urlprotocol (theUrl)
                             if theProtocol is in {"http", "https"} then
                                 # save it
                                 set saveCounter to my saveCurrentTab (saveFolder, saveCounter)
                             else if theProtocol is "file" then
                                 # make an alias of it 
                                 my makeAliasForAFurl (saveFolder, theUrl)
                             end if
                         else
                             if tlvl shallClose then
                                 close current tab
                                 set alreadyClosed to true
                             end if
                         end if

                         if not alreadyClosed and tlvl shallClose then
                             close current tab of it
                             set tabc to tabc - 1
                         else if not tlvl shallClose then
                             set tabc to tabc - 1
                             if tabc> 0 then tell tab tabc to do JavaScript "self.focus ()"
                         end if
                     end repeat
                     # move forwards
                     if not tlvl shallClose then
                         tell tab oldidx to do JavaScript "self.focus ()"
                     end if
                 end tell
             end tell
         end run


         to makeAliasForAFurl (destinationFolder, furl)
             local ti, tids, thefilePath
             set ti to "file: //"
             set {tids, AppleScript text item delimiters} to {AppleScript text item delimiters, ti}
             set thefilePath to text item 2 of furl
             set AppleScript text item delimiters to tids
             set theFile to POSIX file thefilePath as alias
             set theFolder to POSIX file destinationFolder
             tell application "Finder"
                 make alias at theFolder to theFile
                 # I don't care if there was one there from before, as it could equally
                 # be a file with the same name.
             end tell
         end makeAliasForAFurl

         to saveCurrentTab (destinationFolder, timeNumber)
             tell application id "sfri" to activate
             tell application "System Events"
                 set UI elements enabled to true
                 tell process "Safari"
                     keystroke "s" using {command down}
                     tell window 1
                         repeat until exists sheet 1
                             delay 0.2
                         end repeat
                         tell sheet 1
                             if timeNumber = 1 then - We'll set the savepath upon first call
                                 keystroke "g" using {command down, shift down}
                                 repeat until exists sheet 1
                                     delay 0.2
                                 end repeat
                                 tell sheet 1
                                     set value of text field 1 to destinationFolder
                                     click button 1
                                     delay 0.1
                                 end tell
                             end if
                             keystroke return
                             delay 0.2
                             if exists sheet 1 then - We are being asked if we want to overwrite already saved tab
                                 if dontOverWriteSavedTabs then
                                     keystroke return # if it was already saved.  We don't overwrite it
                                     click button 3
                                 else
                                     keystroke tab
                                     keystroke space # we are to overwrite
                                 end if
                             else
                                 try
                                     set dummy to focused of sheet 1
                                 on error
                                     # click button 1 of panel of application "Safari"
                                     keystroke return

                                     delay 0.2
                                     if exists sheet 1 then - We are being asked if we want to overwrite already saved tab
                                         if dontOverWriteSavedTabs then
                                             keystroke return # if it was already saved.  We don't overwrite it
                                             click button 3
                                         else
                                             keystroke tab
                                             keystroke space # we are to overwrite
                                         end if
                                     end if
                                 end try
                             end if
                         end tell
                     end tell
                 end tell
             end tell
             set timeNumber to timeNumber + 1
             return timeNumber
         end saveCurrentTab

         on downloadWindowInFront ()
             tell application "Safari"
                 activate
                 set tabCount to count tabs of its window 1
                 if tabCount 0 then set colons to true
             if (offset of "/" in aPath)> 0 then set slashes to true

             if colons and slashes then
                 return null
             else if colons then
                 set origDelims to ":"
             else if slashes then
                 set origDelims to "/"
             else
                 return null
             end if
             local tids
             set {tids, AppleScript text item delimiters} to {AppleScript text item delimiters, origDelims}
             if aPath = "/" then
                 - we return root when we get root
                 set AppleScript text item delimiters to tids
                 return "/"
             end if
             local theParentFolder
             if text -1 of aPath is in {":", "/"} then
                 set theParentFolder to text items 1 thru -2 of text 1 thru -2 of aPath
             else
                 set theParentFolder to text items 1 thru -2 of aPath
             end if
             set theParentFolder to theParentFolder as text
             if slashes and theParentFolder = "" then set theParentFolder to "/"
             - sets the root path if we got a folder one level below it
             if colons and (":" is not in theParentFolder) then set theParentFolder to theParentFolder & ":"
             - we return volumename, if we are given volumename 
             set AppleScript text item delimiters to tids
             return theParentFolder
         end parentfolder


         script storage
             property cachespath: ((path to library folder from user domain as text) & "caches:" & "net.mcusr." & scripttitle)

             on scriptCache ()

                 local script_cache
                 try
                     set script_cache to load script alias (my cachespath)
                 on error
                     script newScriptCache
                         property DefaultLocation: (path to desktop folder as text)
                         # edit any of those with default values
                     end script

                     set script_cache to newScriptCache
                 end try
                 return script_cache
             end scriptCache

             to saveScriptCache (theCache)
                 store script theCache in my cachespath replacing yes
             end saveScriptCache

             to saveParenFolderInScriptCache (theFolderToSaveIn, script_cache)
                 local containingFolder
                 set containingFolder to (parentfolder of saveTabsInSafariWindowsToFolder for theFolderToSaveIn) & "/"
                 local theLoc
                 set theLoc to POSIX file containingFolder as alias
                 set DefaultLocation of script_cache to theLoc
                 my saveScriptCache (script_cache)
             end saveParenFolderInScriptCache
         end script
     end script
     tell saveTabsInSafariWindowsToFolder to run

Enjoy

+2
source

I found what I consider the best solution:

 tell application "Safari" activate set URL of document 1 to "http://www.apple.com" delay 5 set myString to source of document 1 end tell set newFile to POSIX file "/Users/myUsername/test.html" open for access newFile with write permission write myString to newFile close access newFile 

Notes:

  • β€œDocument Source 1” seems to be filled with the correct source code only AFTER the web page is fully loaded. Therefore, the need for delay. Perhaps you can use a lower latency.

  • There are several solutions that recommend using curl. I have not tried this, but I assume that for dynamically generated pages this may be problematic.

  • The above works on OSX 10.8.4. Not tested for other versions.

+4
source

Automator will do it. Here is the workflow - http://cl.ly/450m0Q21463p16322P1i .

Automator β†’ Actions β†’ Internet β†’ Get Current Webpage from Safari β†’ Download Urls .

+2
source
 set hyperlink to "http://www.google.com/" set sourceCode to (do shell script "curl " & hyperlink) do shell script "echo " & quoted form of sourceCode & " >> /Users/name/Desktop/test.csv" 

You can send this as a repeat, and it will add each source code from each listed item to the end of the created document. i.e.

 set hyperlink to "http://www.aRepetitivePageSite.com/2014?page=" set your_count to 1 repeat until your_count = 10 set sourceCode to (do shell script "curl " & (hyperlink & your_count as string as text)) do shell script "echo " & quoted form of sourceCode & " >> /Users/name/Desktop/test.csv" set your_count to your_count + 1 end repeat 
+1
source

If you must complete this task manually, you must view the source in Safari, copy the source to the clipboard, go to the HTML source editor and create a new document, paste the source code, select "Save" and go to the "Documents" folder, name the document and then save it.

Therefore, when you want to write AppleScript to accomplish this task, the key point is that you still want to use the same applications, but instead of launching them manually, you run them using AppleScript. An excellent AppleScriptable HTML source editor is TextWrangler, which does not contain the Mac App Store.

Once you have a web browser (Safari) to get the HTML source from the web and an HTML source editor (TextWrangler) to create and save the HTML document, you can write very small, very easy to write, very easy to read, very easy Support AppleScript, like this one:

 tell application "Safari" activate if document 1 exists then set theDocumentTitle to the name of document 1 set theDocumentSource to the source of document 1 tell application "TextWrangler" activate set theNewDocument to make new document with properties {name:theDocumentTitle, text:theDocumentSource} set theDocumentsFolderPath to the path to the documents folder as text set theSaveFilePath to theDocumentsFolderPath & theDocumentTitle & ".html" save theNewDocument to file theSaveFilePath end tell end if end tell 

... that just asks Safari to specify the name and source code of its very first document, and then ask TextWrangler to use this information to create and save the corresponding HTML document in the Documents folder. These are tasks in which these 2 applications are very good. You kind of shouldn't ask twice or explain a lot.

0
source

All Articles