In case of anti-detect browsers. A Session is basically a browser instance. And your requests are going basically with cdp navigations like page.goto("https://google.com") or with clicks on elements. So the concurrency is basically an open browser. It doesn't matter how many requests you do with that browser. But IMHO you should mimic human usage fully and only use 1 tab at a time per browser to scrape.
The best way to do is to perform requests after each other within a browser.
OK, let's say I have 1 browser instance running. Does the API allow me to make each of my requests with a different cookie set? For instance could I sign in to a website as two different users and make requests on their behalf? From what you're saying it should be possible I guess?
And how in general work such services like Kameleo? I mean, I open that browser instance and do page.goto('mysite.com'). So, if an anti-bot captcha appears while fetching that site, Kameleo will detect it and solve it? So from my point of view, it'll just take a bit longer, but I'll get the page that I expect, as if no captcha was there? Is that correct?
1 browser instance have 1 browsing context. This means that you can have 1 cookie set. If you want to sign in to a website as 2 different users and make requests on their behalf, you need to launch 2 browsers. We try to do everything to make these browsers use low resources. But this is the only "safe" way we can guarantee the excellent bypass capabilities to maximize success rate. Note that Kameleo is featuring a persistent browsing contexts. This means you can save the cookies and session data as a browser profile you can relaunch later any time.
I once used chrome-nodriver which uses the cdp protocol and it was possible to set cookies in cookiejar before the request, so making requests on behalf on several users was possible despite 1 browser=1 context.
As for captchas - so if I request an antibot-protected site and the captcha is shown in my Kameleo browser instance: what does Kameleo do in such case? Does it try to fetch the page using a different proxy, or just returns it as is?
As Kameleo supports integration with playwright and puppeteer (both allowing you to communicate with browser through CDP) you can actually change the cookies before each request. I have never thought about this use-case. Now I added it to my experimental list. However the main point of using an anti-detect browser is to fully mimic human behavior, and for this I think you should stick with the default browser context.
Kameleo provides 2 custom-built browsers, that you can drive with Selenium, Puppeteer or Playwright. The behavior of the automation will be similar like normal playwright. However the browser will look like a human person preventing captchas to appear. If a captcha appears then you need to solve it (possibly with an extension). In near future we will ad captcha solving as well to the browser. However the "request" will return with the page loading the captcha. And after a short time it will drive the browser to solve the captcha, so then you will be able to access the content you want.
Given that this post was written by the CEO of Kameleo, there is a clear conflict of interest.
I would only trust a comparison like this if it was written by someone who isn't affiliated with any of the contestants and the post contains no special offers or affiliate links.
The last 2 Kameleo reviewers gave it 1 star and I'm more inclined to believe them than the CEO of the company:
I personally approve every guest post published here on The Web Scraping Club and don't find any conflict of interest in this article. Here are some criteria that you have to take into consideration when choosing any anti-detect browser and a recap of the actual prices that I personally double-checked.
We all know how Trustpilot can be manipulated, so I won't use that website.
Thank you Pier, I find your article much more trustworthy than this one.
By conflict of interest, I don't mean to say that any part of the article is false, but that it meets the definition of a conflict.
Wikipedia: "A conflict of interest exists if the circumstances are reasonably believed (on the basis of past experience and objective evidence) to create a risk that a decision may be unduly influenced by other, secondary interests, and not on whether a particular individual is actually influenced by a secondary interest."
But if you still aren't convinced, let me give you some examples of bias in this article that favors Kameleo and is likely a result of the conflict:
1. It only mentions the annual discount for Kameleo despite the fact that all 5 other browsers also have discounts when purchasing annual or bi-annual subscriptions.
2. It only mentions a cheaper plan for Kameleo despite the fact that all of the other browsers have cheaper (and even free PAYG) plans. The START Kameleo plan isn't even adequate for the real-world scenario of 50 concurrent browsers so it shouldn't have been mentioned at all.
Current Kameleo version is v3.4.4 and it is shipped with Chroma 133. This can successfully emulate chrome 134 fingerprints as well. We are releasing Kameleo 4.0 on around 7th April with Chroma 135. The Kameleo 4.0 will include a function called Multi-Kernel. So Kameleo will download the necessary kernel version during runtime. This enables 2 things:
-You can perfectly match the kernel version with the emulated fingerprint to prevent "css version" detection
-We will be able to push out the new kernel versions on the same day as chrome is releasing on stable branch, while we don't need to depend on the Kameleo.CLI release dates.
Can you elaborate how a "concurrent session" should be understood?
Does that just mean concurrent requests?
Or can I have more concurrent requests, but some of them will share an anti-bot session (like a cloudflare cookie).
In case of anti-detect browsers. A Session is basically a browser instance. And your requests are going basically with cdp navigations like page.goto("https://google.com") or with clicks on elements. So the concurrency is basically an open browser. It doesn't matter how many requests you do with that browser. But IMHO you should mimic human usage fully and only use 1 tab at a time per browser to scrape.
The best way to do is to perform requests after each other within a browser.
Let me know if something is not clear yet
OK, let's say I have 1 browser instance running. Does the API allow me to make each of my requests with a different cookie set? For instance could I sign in to a website as two different users and make requests on their behalf? From what you're saying it should be possible I guess?
And how in general work such services like Kameleo? I mean, I open that browser instance and do page.goto('mysite.com'). So, if an anti-bot captcha appears while fetching that site, Kameleo will detect it and solve it? So from my point of view, it'll just take a bit longer, but I'll get the page that I expect, as if no captcha was there? Is that correct?
1 browser instance have 1 browsing context. This means that you can have 1 cookie set. If you want to sign in to a website as 2 different users and make requests on their behalf, you need to launch 2 browsers. We try to do everything to make these browsers use low resources. But this is the only "safe" way we can guarantee the excellent bypass capabilities to maximize success rate. Note that Kameleo is featuring a persistent browsing contexts. This means you can save the cookies and session data as a browser profile you can relaunch later any time.
Captcha solver is currently not bundled with Kameleo and other anti-detect browsers. But we try to prevent captchas. See this video: https://youtu.be/euzfCRNapwI?si=JzpNQ7ppUGnxQfot&t=12 Kameleo is simply considered as a human so need to solve the captcha manually it bypass it manually. If you want, you can auto install captcha solver browser extensions to Kameleo: https://help.kameleo.io/hc/en-us/articles/4418166326417-Getting-started-with-Kameleo-Automation#h_01HPS4HFEZSDS9JX25V4QDY4GD
I once used chrome-nodriver which uses the cdp protocol and it was possible to set cookies in cookiejar before the request, so making requests on behalf on several users was possible despite 1 browser=1 context.
As for captchas - so if I request an antibot-protected site and the captcha is shown in my Kameleo browser instance: what does Kameleo do in such case? Does it try to fetch the page using a different proxy, or just returns it as is?
As Kameleo supports integration with playwright and puppeteer (both allowing you to communicate with browser through CDP) you can actually change the cookies before each request. I have never thought about this use-case. Now I added it to my experimental list. However the main point of using an anti-detect browser is to fully mimic human behavior, and for this I think you should stick with the default browser context.
Kameleo provides 2 custom-built browsers, that you can drive with Selenium, Puppeteer or Playwright. The behavior of the automation will be similar like normal playwright. However the browser will look like a human person preventing captchas to appear. If a captcha appears then you need to solve it (possibly with an extension). In near future we will ad captcha solving as well to the browser. However the "request" will return with the page loading the captcha. And after a short time it will drive the browser to solve the captcha, so then you will be able to access the content you want.
Given that this post was written by the CEO of Kameleo, there is a clear conflict of interest.
I would only trust a comparison like this if it was written by someone who isn't affiliated with any of the contestants and the post contains no special offers or affiliate links.
The last 2 Kameleo reviewers gave it 1 star and I'm more inclined to believe them than the CEO of the company:
https://www.trustpilot.com/review/kameleo.io
I personally approve every guest post published here on The Web Scraping Club and don't find any conflict of interest in this article. Here are some criteria that you have to take into consideration when choosing any anti-detect browser and a recap of the actual prices that I personally double-checked.
We all know how Trustpilot can be manipulated, so I won't use that website.
If you want to have an idea of the performances of anti-detect browsers from a third-party perspective, you can have a look at this article, even if it's a bit outdated and needs a refresh: https://substack.thewebscraping.club/p/anti-detect-browsers-fingerprint-tests
Thank you Pier, I find your article much more trustworthy than this one.
By conflict of interest, I don't mean to say that any part of the article is false, but that it meets the definition of a conflict.
Wikipedia: "A conflict of interest exists if the circumstances are reasonably believed (on the basis of past experience and objective evidence) to create a risk that a decision may be unduly influenced by other, secondary interests, and not on whether a particular individual is actually influenced by a secondary interest."
But if you still aren't convinced, let me give you some examples of bias in this article that favors Kameleo and is likely a result of the conflict:
1. It only mentions the annual discount for Kameleo despite the fact that all 5 other browsers also have discounts when purchasing annual or bi-annual subscriptions.
2. It only mentions a cheaper plan for Kameleo despite the fact that all of the other browsers have cheaper (and even free PAYG) plans. The START Kameleo plan isn't even adequate for the real-world scenario of 50 concurrent browsers so it shouldn't have been mentioned at all.
What is the Chrome version in Kameleo today? How often is it updating kernels?
Current Kameleo version is v3.4.4 and it is shipped with Chroma 133. This can successfully emulate chrome 134 fingerprints as well. We are releasing Kameleo 4.0 on around 7th April with Chroma 135. The Kameleo 4.0 will include a function called Multi-Kernel. So Kameleo will download the necessary kernel version during runtime. This enables 2 things:
-You can perfectly match the kernel version with the emulated fingerprint to prevent "css version" detection
-We will be able to push out the new kernel versions on the same day as chrome is releasing on stable branch, while we don't need to depend on the Kameleo.CLI release dates.