Using ProxyCrawl to get Webpages

<?php

namespace TelfordCodes;

use GuzzleHttp\Client;
use Monolog\Logger;

class CrawlService
{
    private const CRAWL_SERVICE_URL_FMT = "https://api.proxycrawl.com/?token=%s&url=%s";

    function __construct(private Client $client, private Logger $logger) 
    { } 

    public function getResponse(string $base_url) : string
    {   
        $enc_base_url = urlencode($base_url);
        $service_url = sprintf(self::CRAWL_SERVICE_URL_FMT,
                                $_ENV['CRAWL_SERVICE_TOKEN'],
                                $enc_base_url);
        try {
            $response = $this->client->get($service_url);
            $body = $response->getBody();
            $text = $body->getContents();
            $this->logger->info("Successful request for $base_url");
        } catch (\Exception $e) {
            $text = ""; 
            $this->logger->error("Failed request to $base_url " . $e->getMessage());
        }   

        return $text;
    }   
}   

Then the class could be used with code like the following. In the below, the ColoredLineFormatter for Monolog makes reading the log easier, visually distinguishing between successes and errors.

use GuzzleHttp\Client;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
use Bramus\Monolog\Formatter\ColoredLineFormatter;

use TelfordCodes\CrawlService;

// Load environment variables from .env, including the ProxyCrawl API token.
(Dotenv\Dotenv::createImmutable(__DIR__))->load(); 

$client = new GuzzleHttp\Client();

$logger = new Logger("CrawlService");
$handler = new StreamHandler(__DIR__.'/log/crawler.log', Logger::DEBUG);
$handler->setFormatter(new ColoredLineFormatter());
$logger->pushHandler($handler);

$crawl_service = new CrawlService($client,$logger);
$url = "https://site-you-need-goes-here";
$resp = $crawl_service->getResponse($url);
Published
Categorised as PHP