python selenium webdriver 로 웹사이트 full page scroll capture 하기 > 개발

python selenium webdriver 로 웹사이트 full page scroll capture 하기

페이지 정보

작성자 관리자 (112.♡.173.204) 작성일 21-04-30 13:19 조회 3,875 댓글 0

본문

python selenium webdriver 로 웹페이지 전체를 캡쳐하는 방법은 구글링하면 많이 나온다.
하지만, 대부분 headless mode 에서만 동작하는 소스들이다.

그래서, 헤드리스 모드가 아닌 일반 모드에서 동작하는 full page capture 소스코드를 갈무리 한다.

- 출처 : http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html

페이지의 y 좌표값을 조금씩 내려가며, 한땀한땀 부분들을 캡쳐하여 큰 한장의 이미지로 합쳐서 마무리하는,
한마디로 장인 정신이 깃든 작업 소스이다.

이 소스를 사용하려면, PIL (Python Image Library) 가 필요한데, 설치방법은 아래에 있다.

test.py
```
"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/

It contains the *crucial* correction added in the comments by Jason Coutu.
"""

import sys

from selenium import webdriver
import unittest

import util

class Test(unittest.TestCase):
""" Demonstration: Get Chrome to generate fullscreen screenshot """

def setUp(self):
self.driver = webdriver.Chrome()

def tearDown(self):
self.driver.quit()

def test_fullpage_screenshot(self):
''' Generate document-height screenshot '''
url = "http://effbot.org/imagingbook/introduction.htm"
self.driver.get(url)
util.fullpage_screenshot(self.driver, "test.png")

if __name__ == "__main__":
unittest.main(argv=[sys.argv[0]])
```

util.py
```
import os
import time

from PIL import Image

def fullpage_screenshot(driver, file):

print("Starting chrome full page screenshot workaround ...")

total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
rectangles = []

i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height

if top_height > total_height:
top_height = total_height

while ii < total_width:
top_width = ii + viewport_width

if top_width > total_width:
top_width = total_width

print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
rectangles.append((ii, i, top_width,top_height))

ii = ii + viewport_width

i = i + viewport_height

stitched_image = Image.new('RGB', (total_width, total_height))
previous = None
part = 0

for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)

file_name = "part_{0}.png".format(part)
print("Capturing {0} ...".format(file_name))

driver.get_screenshot_as_file(file_name)
screenshot = Image.open(file_name)

if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])

print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
stitched_image.paste(screenshot, offset)

del screenshot
os.remove(file_name)
part = part + 1
previous = rectangle

stitched_image.save(file)
print("Finishing chrome full page screenshot workaround...")
return True
```

## PIL 설치

PIL은 개발이 중단되었고, PIL의 포크(fork)인 Pillow가 그 역할을 대신하고 있다고 한다.
아래 두가지 중 하나로 설치하면 되는 듯 하다.

```
pip install pillow
```
or
```
pip install image
```

## 참고
http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html
https://stackoverflow.com/questions/41721734/take-screenshot-of-full-page-with-selenium-python-with-chromedriver
https://ko.wikipedia.org/wiki/Python_Imaging_Library

추천0

댓글목록 0

등록된 댓글이 없습니다.