python selenium webdriver 로 웹사이트 full page scroll capture 하기
페이지 정보
작성자 관리자 (112.♡.173.204) 작성일 21-04-30 13:19 조회 3,875 댓글 0본문
python selenium webdriver 로 웹페이지 전체를 캡쳐하는 방법은 구글링하면 많이 나온다.
하지만, 대부분 headless mode 에서만 동작하는 소스들이다.
그래서, 헤드리스 모드가 아닌 일반 모드에서 동작하는 full page capture 소스코드를 갈무리 한다.
- 출처 : http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html
페이지의 y 좌표값을 조금씩 내려가며, 한땀한땀 부분들을 캡쳐하여 큰 한장의 이미지로 합쳐서 마무리하는,
한마디로 장인 정신이 깃든 작업 소스이다.
이 소스를 사용하려면, PIL (Python Image Library) 가 필요한데, 설치방법은 아래에 있다.
test.py
```
"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/
It contains the *crucial* correction added in the comments by Jason Coutu.
"""
import sys
from selenium import webdriver
import unittest
import util
class Test(unittest.TestCase):
""" Demonstration: Get Chrome to generate fullscreen screenshot """
def setUp(self):
self.driver = webdriver.Chrome()
def tearDown(self):
self.driver.quit()
def test_fullpage_screenshot(self):
''' Generate document-height screenshot '''
url = "http://effbot.org/imagingbook/introduction.htm"
self.driver.get(url)
util.fullpage_screenshot(self.driver, "test.png")
if __name__ == "__main__":
unittest.main(argv=[sys.argv[0]])
```
util.py
```
import os
import time
from PIL import Image
def fullpage_screenshot(driver, file):
print("Starting chrome full page screenshot workaround ...")
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
rectangles.append((ii, i, top_width,top_height))
ii = ii + viewport_width
i = i + viewport_height
stitched_image = Image.new('RGB', (total_width, total_height))
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)
file_name = "part_{0}.png".format(part)
print("Capturing {0} ...".format(file_name))
driver.get_screenshot_as_file(file_name)
screenshot = Image.open(file_name)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
stitched_image.paste(screenshot, offset)
del screenshot
os.remove(file_name)
part = part + 1
previous = rectangle
stitched_image.save(file)
print("Finishing chrome full page screenshot workaround...")
return True
```
## PIL 설치
PIL은 개발이 중단되었고, PIL의 포크(fork)인 Pillow가 그 역할을 대신하고 있다고 한다.
아래 두가지 중 하나로 설치하면 되는 듯 하다.
```
pip install pillow
```
or
```
pip install image
```
## 참고
http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html
https://stackoverflow.com/questions/41721734/take-screenshot-of-full-page-with-selenium-python-with-chromedriver
https://ko.wikipedia.org/wiki/Python_Imaging_Library
하지만, 대부분 headless mode 에서만 동작하는 소스들이다.
그래서, 헤드리스 모드가 아닌 일반 모드에서 동작하는 full page capture 소스코드를 갈무리 한다.
- 출처 : http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html
페이지의 y 좌표값을 조금씩 내려가며, 한땀한땀 부분들을 캡쳐하여 큰 한장의 이미지로 합쳐서 마무리하는,
한마디로 장인 정신이 깃든 작업 소스이다.
이 소스를 사용하려면, PIL (Python Image Library) 가 필요한데, 설치방법은 아래에 있다.
test.py
```
"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/
It contains the *crucial* correction added in the comments by Jason Coutu.
"""
import sys
from selenium import webdriver
import unittest
import util
class Test(unittest.TestCase):
""" Demonstration: Get Chrome to generate fullscreen screenshot """
def setUp(self):
self.driver = webdriver.Chrome()
def tearDown(self):
self.driver.quit()
def test_fullpage_screenshot(self):
''' Generate document-height screenshot '''
url = "http://effbot.org/imagingbook/introduction.htm"
self.driver.get(url)
util.fullpage_screenshot(self.driver, "test.png")
if __name__ == "__main__":
unittest.main(argv=[sys.argv[0]])
```
util.py
```
import os
import time
from PIL import Image
def fullpage_screenshot(driver, file):
print("Starting chrome full page screenshot workaround ...")
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
rectangles.append((ii, i, top_width,top_height))
ii = ii + viewport_width
i = i + viewport_height
stitched_image = Image.new('RGB', (total_width, total_height))
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)
file_name = "part_{0}.png".format(part)
print("Capturing {0} ...".format(file_name))
driver.get_screenshot_as_file(file_name)
screenshot = Image.open(file_name)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
stitched_image.paste(screenshot, offset)
del screenshot
os.remove(file_name)
part = part + 1
previous = rectangle
stitched_image.save(file)
print("Finishing chrome full page screenshot workaround...")
return True
```
## PIL 설치
PIL은 개발이 중단되었고, PIL의 포크(fork)인 Pillow가 그 역할을 대신하고 있다고 한다.
아래 두가지 중 하나로 설치하면 되는 듯 하다.
```
pip install pillow
```
or
```
pip install image
```
## 참고
http://seleniumpythonqa.blogspot.com/2015/08/generate-full-page-screenshot-in-chrome.html
https://stackoverflow.com/questions/41721734/take-screenshot-of-full-page-with-selenium-python-with-chromedriver
https://ko.wikipedia.org/wiki/Python_Imaging_Library
추천0
댓글목록 0
등록된 댓글이 없습니다.