me

puppeteer


爬取页面或者接口数据

页面得是 ssr(服务器渲染)post,get 等请求都可以

使用 axios

const axios = require("axios");
const fs = require("fs/promises");
const url = "https://music.163.com/#/song?id=30431367";
axios.get(url, (error, response, body) => {
  fs.writeFile("./bilibili.html", body);
});

Puppeteer 简介

Puppeteer 是一个 Node 库,它提供了一个高级 API 来通过 DevTools 协议控制 ChromeChromium.Puppeteer 默认无头运行,但可以配置为运行完整 (非无头) ChromeChromium

使用特点

puppeteer-core

使用puppeteer-core需要传google所在的浏览器的绝对地址,属性executablePath

const puppeteer = require("puppeteer-core");
const findChrome = require("./node_modules/carlo/lib/find_chrome");
(async function () {
  const findChromePath = await findChrome({});
  const executablePath = await findChromePath.executablePath;
  const browser = await puppeteer.launch({
    //executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    executablePath,
    headless: false,
    defaultViewport: null,
    args: ["--start-fullscreen"],
    ignoreHTTPSErrors: true,
  });
  const page = await browser.newPage();
  await page.goto("https://www.bilibili.com/");
  await page.screenshot({
    path: "./bilibili.png",
    type: "png",
  });
  browser.close();
})();

puppeteer 概述

Puppeteer API 是分层的,并反映了浏览器结构

参考:https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteer-vs-puppeteer-core

关于使用

使用 connect 连接

启动带有所有插件和账号的 chrome

查看 cdp 协议,在 settings > Experiments > Protocol Monitor 打开