################## NODRIVER ################## ************** for docs click `here`_ ************** .. _here: https://second-hand-friends.github.io/nodriver/ **This package provides next level webscraping and browser automation using a relatively simple interface.** * **This is the official successor of the** `Undetected-Chromedriver `_ **python package.** * **No more webdriver, no more selenium** Direct communication provides even better resistance against web applicatinon firewalls (WAF's), while performance gets a massive boost. This module is, contrary to undetected-chromedriver, fully asynchronous. What makes this package different from other known packages, is the optimization to stay undetected for most anti-bot solutions. Another focus point is usability and quick prototyping, so expect a lot to work `-as is-` , with most method parameters having `best practice` defaults. Using 1 or 2 lines, this is up and running, providing best practice config by default. While usability and convenience is important. It's also easy to fully customizable everything using the entire array of `CDP `_ domains, methods and events available. Some features ============= * A blazing fast undetected chrome (-ish) automation library * No chromedriver binary or Selenium dependency * This equals bizarre performance increase and less detections! * Up and running in 1 line of code* * uses fresh profile on each run, cleans up on exit * save and load cookies to file to not repeat tedious login steps * smart element lookup, by selector or text, including iframe content. this could also be used as wait condition for a element to appear, since it will retry for the duration of until found. single element lookup by text using tab.find(), accepts a best_match flag, which will not naively return the first match, but will match candidates by closest matching text length. * descriptive __repr__ for elements, which represent the element as html * utility function to convert a running undetected_chromedriver.Chrome instance to a nodriver.Browser instance and contintue from there * packed with helpers and utility methods for most used and important operations what is new ============= **tab.xpath("/some/x/path[#contains **tab.cf_verify()** finds the checkbox and click it successfully this only works when NOT in expert mode. currently built-in english only requires opencv-python package to be installed .. video:: cf_verify_.mp4 :autoplay: :playsinline: :muted: :width: 500 **tab.bypass_insecure_connection_warning()** convenience method, for insecure page warning. for example when a certificate is invalid. **tab.open_external_debugger()** lets you inspect the tab without breaking your connection **tab.get_local_storage()** get localstorage content **tab.set_local_storage(dict)** set localstorage content **tab.add_handler(someEvent, callback)** callback may accept a single argument (event), or 2 arguments (event, tab). **start(expert=True)** does some hacking for more experienced users. It disables web security and origin-trials, as well as ensures shadow-roots are always open. Some examples of what the api looks like ================================================ .. * ```elem.text``` * ```elem.text_all``` * ```elem.parent.parent.parent.attrs``` * ```anchor_elem.href and anchor_elem['href']``` * ```anchor_elem.href = 'someotherthing'; await anchor_elem.save()``` * ```elem.children[-1].children[0].children[4].parent.parent``` * ```await html5video_element.record_video()``` * ```await html5video_element('pause')``` * ```await html5video_element.apply('''(el) => el.currentTime = 0''')``` * ```tab = await browser.get(url, new_tab=True)``` * ```tab_win = await browser.get(url, new_window=True)``` * ```first = await tab.find('search text')``` * ```best = await tab.find('search text', best_match=True)``` * ```all_results = await tab.find_all('search text')``` * ```first_submit_button = await tab.select(selector='button[type=submit]')``` * ```inputs_in_form = await tab.select_all('form input')``` Installation ============= .. code-block:: pip install nodriver .. _getting-started-commands: usage example =============== The aim of this project (just like undetected-chromedriver, somewhere long ago) is to keep it short and simple, so you can quickly open an editor or interactive session, type or paste a few lines and off you go. .. code-block:: python import asyncio import nodriver as uc async def main(): browser = await uc.start() page = await browser.get('https://www.nowsecure.nl') await page.save_screenshot() await page.get_content() await page.scroll_down(150) elems = await page.select_all('*[src]') for elem in elems: await elem.flash() page2 = await browser.get('https://twitter.com', new_tab=True) page3 = await browser.get('https://github.com/ultrafunkamsterdam/nodriver', new_window=True) for p in (page, page2, page3): await p.bring_to_front() await p.scroll_down(200) await p # wait for events to be processed await p.reload() if p != page3: await p.close() if __name__ == '__main__': # since asyncio.run never worked (for me) uc.loop().run_until_complete(main()) A more concrete example, which can be found in the ./example/ folder, shows a script to create a twitter account .. code-block:: python import random import string import logging logging.basicConfig(level=30) import nodriver as uc months = [ "january", "february", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december", ] async def main(): driver = await uc.start() tab = await driver.get("https://twitter.com") # wait for text to appear instead of a static number of seconds to wait # this does not always work as expected, due to speed. print('finding the "create account" button') create_account = await tab.find("create account", best_match=True) print('"create account" => click') await create_account.click() print("finding the email input field") email = await tab.select("input[type=email]") # sometimes, email field is not shown, because phone is being asked instead # when this occurs, find the small text which says "use email instead" if not email: use_mail_instead = await tab.find("use email instead") # and click it await use_mail_instead.click() # now find the email field again email = await tab.select("input[type=email]") randstr = lambda k: "".join(random.choices(string.ascii_letters, k=k)) # send keys to email field print('filling in the "email" input field') await email.send_keys("".join([randstr(8), "@", randstr(8), ".com"])) # find the name input field print("finding the name input field") name = await tab.select("input[type=text]") # again, send random text print('filling in the "name" input field') await name.send_keys(randstr(8)) # since there are 3 select fields on the tab, we can use unpacking # to assign each field print('finding the "month" , "day" and "year" fields in 1 go') sel_month, sel_day, sel_year = await tab.select_all("select") # await sel_month.focus() print('filling in the "month" input field') await sel_month.send_keys(months[random.randint(0, 11)].title()) # await sel_day.focus() # i don't want to bother with month-lengths and leap years print('filling in the "day" input field') await sel_day.send_keys(str(random.randint(0, 28))) # await sel_year.focus() # i don't want to bother with age restrictions print('filling in the "year" input field') await sel_year.send_keys(str(random.randint(1980, 2005))) await tab # let's handle the cookie nag as well cookie_bar_accept = await tab.find("accept all", best_match=True) if cookie_bar_accept: await cookie_bar_accept.click() await tab.sleep(1) next_btn = await tab.find(text="next", best_match=True) # for btn in reversed(next_btns): await next_btn.mouse_click() print("sleeping 2 seconds") await tab.sleep(2) # visually see what part we're actually in print('finding "next" button') next_btn = await tab.find(text="next", best_match=True) print('clicking "next" button') await next_btn.mouse_click() # just wait for some button, before we continue await tab.select("[role=button]") print('finding "sign up" button') sign_up_btn = await tab.find("Sign up", best_match=True) # we need the second one print('clicking "sign up" button') await sign_up_btn.click() print('the rest of the "implementation" is out of scope') # further implementation outside of scope await tab.sleep(10) driver.stop() # verification code per mail if __name__ == "__main__": # since asyncio.run never worked (for me) # i use uc.loop().run_until_complete(main())