Selenium for Browser Leverage

Republished from a prior blogpost in August of this year

December 27, 2016

The technical luxuries just do not stop rolling in. I was out recently researching Chrome extensions, with the intention of performing browser automation, when I ran into a framework called Selenium which makes this kind of task vastly easier to code. Chrome extensions do involve a fair amount of javascript coding. With Selenium however, not only was I able to pick the programming language from about half a dozen choices, but I was able to do just what I wanted done in under 20 lines of Python. In the case of Chrome, the server part of the browser driver is built by the Chrome development team, so I know the method calls will work as advertised. It gets even better though, since the Selenium 'wire protocol' is now a draft standard published by the W3C (WWW Consortium). I'm getting all kind of warm fuzzy feelings that this framework has heavy-weight industry support, and it won't go away anytime soon so my efforts won't be wasted.

The official purpose for the Selenium framework is to perform QA testing on web applications and various browsers, on various operating platforms. I have a slightly different usage in mind, in that I am a fiend for creating browser launchers. These are short scripts which launch a browser, navigate to one or more pages, filling in textual form fields where needed, clicking buttons or otherwise submitting forms, and generally getting me logged into different web services which I use. There are a couple reasons I have for doing this. For one it is just butt-simple, especially when I am very busy with other things, to simply click a button and be logged in somewhere. The second reason is computer security, and unless you live under a rock you've noticed this is a hot topic right now (in fact for several years running). Your browser can be easily hooked using XSS (cross-site scripting), and then your cookies are exposed to an attacker. If you stay logged into online services then it is trivial for bad people to get your session cookies, and then they can just change your password, and you don't want to get pwned.

Setup and Usage

So the idea behind a browser launcher is to get logged in quickly, as needed, and then log out immediately when finished. Otherwise you spend half your time fumbling about navigating a web browser and looking for URLs and passwords. Actually there was a third reason I like to do this, and that is to stay abreast of webpage and browser technologies. Setting up these scripts involves examining the structure of a webpage and its elements. How easy is it to set up and use Selenium? From the documentation, here are the requirements.

The first item on the list should be obvious. See the Selenium wiki for official documentation. The second item, in the case of Python, is obtained by installing the selenium package with the following command.

> pip install selenium

The third item on the list can be obtained from the Chrome developers. Your mileage may vary, but I believe it is wise to use the 64-bit version of this executable if you are using 64-bit Python. That's all for the setup. For me the hard part was the gymnastics involved in launching the correct things, with the correct environment. I have two versions of Python installed on my (windows) computer, so for testing Python scripts from a command-line, I created the following shortcut.

C:\Windows\System32\cmd.exe /k "set Path=%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\ProgramsDev1\Python_x64_2711;C:\ProgramsDev1\Python_x64_2711\Scripts; & E: & cd E:\PythonProjects\ChromeLaunchScripts"

This just opens a DOS-box, sets the Path environment variable to the correct version of Python (in this case version 2.711), then changes my current working directory to E:\PythonProjects\ChromeLaunchScripts\ , which is where I store my launcher scripts. I also put the chromedriver.exe file in the same directory.

Here is what a typical script looks like, to automate the Chrome browser in Python.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument("--window-size=1762,1200")
options.add_argument("--window-position=102,0")

driver = webdriver.Chrome(chrome_options=options)
driver.get('https://wordpress.com/wp-login.php');

logincss = 'form#loginform input#user_login'
login = driver.find_element_by_css_selector(logincss)
login.send_keys('4draft')

pwdcss = 'form#loginform input#user_pass'
pwd = driver.find_element_by_css_selector(pwdcss)
pwd.send_keys('my password')

formcss = 'form#loginform input.button'
form = driver.find_element_by_css_selector(formcss)
form.submit()

driver.close()

From the command environment, run the script by issuing the command

> Python scriptname.py

Command box executing python test1.py

There isn't much in this basic script. They tell you to close the driver when you're finished with it, but the truth is that this sequence of steps goes so fast you may not even have a chance to see the browser opening. Because of that, and also because my intention is to have the browser remain open after the script finishes, I usually don't call driver.close() .

Obviously there are other functions you can call to customize the script. In fact Selenium has a rich API for driving the browser navigation. After looking around the documentation, this is the best API listing that I found. If you look closely you'll noticed the language used here is javascript, but all the method names, parameters, class heirarchies, and example usage are provided. You'll have to translate these to your own language of choice. You can click the 'View Source' link in the upper-right corner to view the actual source code on github. There is quite a lot of functionality there, and much of it that I will probably never use.

I also looked into ways of setting up single-click launchers for these scripts, since I don't always like to work from a command-line. My goal was a single-click launch, without having a DOS-box appear, and which could be run as a short-cut from another location. On MSWindows, this could be done using AutoIt. It could also be done using PowerShell, however I always hear people complaining about executing PowerShell scripts directly from the shell; apparently this is just too much power. For simplicity I turned to a Visual Basic script, such as this one.

Set ShellObj = CreateObject ("Wscript.Shell")
Dim strArgs
strArgs = "setenv.bat scriptname.py"
ShellObj.Run strArgs, 0, true

The setenv.bat file will generically set up the environment, much like I did in the Python command short-cut created earlier. Here are the contents of setenv.bat

set Path=%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\ProgramsDev1\Python_x64_2711;C:\ProgramsDev1\Python_x64_2711\Scripts;
E:
cd E:\PythonProjects\ChromeLaunchScripts
python %1

So what happens is the launcher.vbs script, when double-clicked from the Windows shell, will open an invisible DOS-box and execute setenv.bat , passing in scriptname.py as its only parameter; setenv.bat then creates the correct environment for running Python scripts, and then runs the Python script that was passed in as a parameter. I have all these files in the same directory by the way. Now I can create a short-cut to the .vbs launcher file and place it anywhere on my system. In fact I have a system toolbar on my desktop, and I can execute the .vbs files from a menu entry.

System toolbar for launching shortcuts.

Linux

For Linux, the setup and usage is identical to what I just described for the MSWindows system. This was in fact one of my goals, to use both a browser (Chrome) and a script language (Python) which work consistently the same across multiple operating environments. Note that in Linux, Selenium had difficulty figuring out where the chromedriver was located, so I had to place it on the system path, whereas in Windows I could place chromedriver in the same directory where the Python script was launched from. The big difference in Linux is in how to setup shortcuts and launchers, since these are somewhat dependent on your Linux distribution. I plan on writing an article which gives details on setting up a top-panel launcher menu in Gnome3 as a Gnome Shell Extension.

I said earlier that in my Python automation scripts I decided to not call the driver.close() function. This is my choice, but it does lead to another problem (see the following image).

Windows processes list, showing multiple chromedriver.exe open in memory.

After numerous invocations of your launcher scripts, you will end up with numerous orphaned chromedriver executables in memory, since they were never closed properly. The Selenium people never provided the functionality to close the driver while leaving the browser running, since it was never viewed as necessary for browser-webapp QA testing. As a result I was left to my own devices, and I have created some boiler-plate Python code which I add to every automation script. This additional Python code, shown below, simply identifies the chromedriver in memory and kills it, and this executes after the automation code finishes.

import os
import signal
import ctypes

def linuxkill(pstring):
    for line in os.popen("ps ax | grep " + pstring + " | grep -v grep"):
        fields = line.split()
        pid = fields[0]
        os.kill(int(pid), signal.SIGKILL)

def winkill(pstring):
    print pstring
    PyIds = [int(line.split()[1]) for line in os.popen('tasklist').readlines()[3:] if line.split()[0] == pstring]
    for pid in PyIds:
        #print pid
        os.system("taskkill /F /pid %i" % pid)

if os.name == 'nt':
    winkill('chromedriver.exe')
elif os.name == 'posix':
    linuxkill('chromedriver')

The three import lines go at the very top of the automation script file, and the rest of the code can go at the very bottom of the file. I have never tested this on an Android or MacOS system, but I imagine they work very much like it does on Linux, since Android and MacOS are both built on a Linux kernel.

In case anyone is curious about toying with Selenium, I did some preliminary work using Notepad++ , since it performs syntax-highlighting on Python files, and code-folding as well. Later I moved to using Eclipse+Pydev for the code-completion (like intellisense) and library integration. Finally, if you are learning Selenium in the Python language, here is a great site to use for a reference.

 

-R. Foreman