Get webpage on screen with Selenium WebDriver

Is there a way to get the screen coordinates of an HTML window (page body) using Selenium WebDriver?

+6
source share
8 answers

We saw this several times and have not yet found an elegant solution from WebDriver (they have a parameter that supports ILocatable in their settings, but this method has not yet been implemented).

I use UIAutomation to get an AutomationElement window and use treewalker to find the actual window object - a drawback, I noticed that browsers sometimes update what their window does so that the conditional conditions must change to fit every time.

Here is a sample code (I removed some proprietary code here so it is more elegant at my end, but this should work in C #)

public static Rectangle GetAbsCoordinates(this IWebElement element) { var driver = GetDriver(element); var handle = GetIntPtrHandle(driver); var ae = AutomationElement.FromHandle(handle); AutomationElement doc = null; var caps = ((RemoteWebDriver) driver).Capabilities; var browserName = caps.BrowserName; switch (browserName) { case "safari": var conditions = (new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane), new PropertyCondition(AutomationElement.ClassNameProperty, "SearchableWebView"))); doc = ae.FindFirst(TreeScope.Descendants, conditions); break; case "firefox": doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document)); break; case "chrome": doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Chrome Legacy Window")); if (doc == null) { doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Google Chrome")); if (doc == null) throw new Exception("unable to find element containing browser window"); doc = doc.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document)); } break; case "internet explorer": doc = ae.FindFirst(TreeScope.Descendants, new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane), new PropertyCondition(AutomationElement.ClassNameProperty, "TabWindowClass"))); break; } if (doc == null) throw new Exception("unable to find element containing browser window"); var iWinLeft = (int) doc.Current.BoundingRectangle.Left; var iWinTop = (int)doc.Current.BoundingRectangle.Top; var coords = ((ILocatable) element).Coordinates; var rect = new Rectangle(iWinLeft + coords.LocationInDom.X, iWinTop + coords.LocationInDom.Y, element.Size.Width, element.Size.Height); return rect; } public static IWebDriver GetDriver(this IWebElement e) { return ((IWrapsDriver)e).WrappedDriver; } public static IntPtr GetIntPtrHandle(this IWebDriver driver, int timeoutSeconds = Timeout) { var end = DateTime.Now.AddSeconds(timeoutSeconds); while(DateTime.Now < end) { // Searching by AutomationElement is a bit faster (can filter by children only) var ele = AutomationElement.RootElement; foreach (AutomationElement child in ele.FindAll(TreeScope.Children, Condition.TrueCondition)) { if (!child.Current.Name.Contains(driver.Title)) continue; return new IntPtr(child.Current.NativeWindowHandle);; } } return IntPtr.Zero; } 
+4
source

hmmm, I cannot directly comment on one user asking about chrome, so I will have to add another comment here.

Basically for UIAutomation, you will want to use a tool called "verification" (supplied free of charge in SDK 8.1). Old tools like uispy are likely to work.

Basically, you have to run chrome and then run the inspector tool - you are going to look at the structure of the tree and then go to the document containing the DOM. Turn on the backlight in the tool to make it easier.

Chrome is quite dynamic, it seems, in the layout of the tree controls - I had to modify it several times to accommodate the control I'm looking at. If you are using a different version than me - basically find the document window in the tree and look at all the associated control templates - this is what I go into the PropertyCondition to find the control. Intellisense should bring you different things to request, for example for AutomationElement.NameProperty. This is an example that I had - I noticed that there is a difference between when I run chrome on a winXP machine against a win8 machine ... therefore, checking for null.

As I said before, this is not elegant and it would be great if it were built into Selenium (I think they have much better methods for determining the coordinates of the DOM area) ... I think this will also be problematic for people moving to Selenium Grid (for example, I look at this) - as far as I know, using it, I don’t know if you can transfer a bunch of dll support to selenium to a remote machine ... at least without a lot of hacks.

If it still does not work for you - give me a specific idea in the OS, version of Chrome, and I will try to take a look and give an exact match of the properties. This is probably best if you play it yourself, as these things are not static, unfortunately.

+1
source

you can try this way:

  WebDriver driver=new FirefoxDriver(); driver.get("http://www.google.com"); JavascriptExecutor js=(JavascriptExecutor) driver; Double i= (Double) js.executeScript("var element = document.getElementById('hplogo');var position = element.getBoundingClientRect();return position.left"); System.out.print(i); 
0
source

I quickly looked at chrome, and you might be lucky with the following.

 doc = win.Find.ByConditions(new PropertyCondition(AutomationElement.ClassNameProperty, "Chrome_RenderWidgetHostHWND")); 

I think the class name is consistent for chrome ... it looks like the older and newer OS works for me - chrome version 34.0.1847.116m. Hope this helps.

0
source

The published Zechtitus code is amazing, I tried it in IE11 and Chrome version 39.0.2171.95 m, and it worked like a charm. Although I had to pass the real IWebDriver object instead of using WrappedDriver, because it does not work with Chrome. For your information only, I have Win 7 Ultimate x64 and using Selenium WebDriver 2.44. this is the code i took from zechtitus and changed it:

  public static Rectangle GetAbsCoordinates(IWebDriver driver, IWebElement element) { var handle = GetIntPtrHandle(driver); var ae = AutomationElement.FromHandle(handle); AutomationElement doc = null; var caps = ((RemoteWebDriver)driver).Capabilities; var browserName = caps.BrowserName; switch (browserName) { case "safari": var conditions = (new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane), new PropertyCondition(AutomationElement.ClassNameProperty, "SearchableWebView"))); doc = ae.FindFirst(TreeScope.Descendants, conditions); break; case "firefox": doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document)); break; case "chrome": doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Chrome Legacy Window")); if (doc == null) { doc = ae.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.NameProperty, "Google Chrome")); if (doc == null) throw new Exception("unable to find element containing browser window"); doc = doc.FindFirst(TreeScope.Descendants, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Document)); } break; case "internet explorer": doc = ae.FindFirst(TreeScope.Descendants, new AndCondition(new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.Pane), new PropertyCondition(AutomationElement.ClassNameProperty, "TabWindowClass"))); break; } if (doc == null) throw new Exception("unable to find element containing browser window"); var iWinLeft = (int)doc.Current.BoundingRectangle.Left; var iWinTop = (int)doc.Current.BoundingRectangle.Top; var coords = ((ILocatable)element).Coordinates; var rect = new Rectangle(iWinLeft + coords.LocationInDom.X, iWinTop + coords.LocationInDom.Y, element.Size.Width, element.Size.Height); return rect; } public static IntPtr GetIntPtrHandle(this IWebDriver driver, int timeoutSeconds = 20) { var end = DateTime.Now.AddSeconds(timeoutSeconds); while (DateTime.Now < end) { // Searching by AutomationElement is a bit faster (can filter by children only) var ele = AutomationElement.RootElement; foreach (AutomationElement child in ele.FindAll(TreeScope.Children, Condition.TrueCondition)) { if (!child.Current.Name.Contains(driver.Title)) continue; return new IntPtr(child.Current.NativeWindowHandle); ; } } return IntPtr.Zero; } 

and I used it like this:

Rectangle recView = GetAbsCoordinates (MyWebDriverObj, myIWebElementObj);

the correct X, Y are then saved in recView.X and recView.Y As I said, it works for me for both IE11 and Chrome. Good luck.

0
source

This should work after its support:

  WebElement htmlElement = driver.findElement(By.tagName("html")); Point viewPortLocation = ((Locatable) htmlElement).getCoordinates().onScreen(); int x = viewPortLocation.getX(); int y = viewPortLocation.getY(); 

However, now he is throwing the following error:

 java.lang.UnsupportedOperationException: Not supported yet. at org.openqa.selenium.remote.RemoteWebElement$1.onScreen(RemoteWebElement.java:342) 

(on org.seleniumhq.selenium: selenium-java: 2.46.0)

0
source

Yes. Maybe. With a little trick. Find my code below to enter the top position of the web element screen.

  long scrollPosition = getScollPosition(); long elemYPositionOnScreen = (long) elem.getLocation().getY() - scrollPosition; public static long getScrollYPosition() { WebDriver driver = DriverFactory.getCurrentDriver(); JavascriptExecutor jse = (JavascriptExecutor) driver; Long scrollYPos = (Long) jse.executeScript("return window.scrollY;"); return scrollYPos; } 
0
source

Try this, I hope this helps you:

 Rectangle rec = new Rectangle(element.getLocation(), element.getSize()); 
-1
source

All Articles