-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments extraction issue #198
Comments
Hi @gqwang16, Facebook gives you a limit of searches without being logged in. I use a proxy to get information. |
Hi @kevinzg, I have a problem with the comments. They are bringing information from other posts. Do you know how I can fix it? |
Hi, could you please share how to use a proxy to get information? Actually,
I am new to python and I would be very appreciative if you could send me
some resources or even python examples on extracting facebook comments?
Thanks a lot for the help!
My email is ***@***.***
…On Sat, Apr 3, 2021 at 9:00 PM lgjluis ***@***.***> wrote:
Hi @gqwang16 <https://github.com/gqwang16>,
Facebook gives you a limit of searches without being logged in. I use a
proxy to get information.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#198 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASH54LEV6JQCHA7SW5CYYEDTG62ZXANCNFSM42KXULJQ>
.
|
I use a local service as a rotating proxy. In facebook_scraper.py change: `def init(self, session=None, requests_kwargs=None):
You must change the {ip} and {port}. |
Hi @neon-ninja, If I activate the comments, after a while Facebook closes the connection and stops extracting information. For this reason I use a proxy. |
Gotcha. Yes, extracting comments results in more requests to Facebook servers, which results in triggering a temporary IP ban faster |
Please @lgjluis can you share a working script or simple example on how to use a rotating proxy? |
Twint has support for tor, and reloading tor if IP banned - might be worth porting here |
Hi @Christian-Nja, I use a Docker with a rotating-proxy. |
Ok thank you. So, just to be sure of what technique to follow.
Is this correct? You can't get the benefit of being logged in and IP rotation at once. |
Sounds about right. Depending on what you're trying to do, #212 might also be useful. Additionally, if you had multiple accounts, cookie rotation might work |
@neon-ninja @lgjluis When I use get_posts() just once, does this imply a single request, or internally this function will launch a loop of requests which I can't slow-down? |
@abubelinha actually, get_posts returns a generator - and requests are only made when you iterate through it. Note that each page contains 4 posts. So, import time
from facebook_scraper import get_posts
for post in get_posts("Nintendo"):
print(post.get("post_id"))
time.sleep(.25) Should add a one second delay in between each request |
this is what brought me here today was looking to see if you guys were looking on implementing this feature, it would be a tremendous help |
I use get_posts("the page I scrape", pages=3,options={'comments':True}) to extract the comments, however, I got nonzero comments number but nothing in the "comments_full". Does anyone know the reason or how to extract comments?
The text was updated successfully, but these errors were encountered: