-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rozetka 3 actors added #533
base: trunk
Are you sure you want to change the base?
Conversation
Please do not use custom editorconfig, eslintrc nor prettierrc. Also, please, do not use TypeScript. Actors have to be directly usable without any compilation steps. |
Dear Aleš,
I’m a developer at Geniusee and we are partners with Apify.
We usually develop our actors for Apify in typescript with a folder dist included so that the actor doesn’t require any build steps before run. Does it still mean we should change the typescript code to javascript? If it’s crucial for the project requirements, we’ll surely do so.
Sincerely yours,
Vlad
14 апр. 2021 г., 15:53 +0300, Aleš Roubíček ***@***.***>, писал:
… Please do not use custom editorconfig, eslintrc nor prettierrc. Also, please, do not use TypeScript. Actors have to be directly usable without any compilation steps.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Oh, I see Hlidac has dist/ directory in .gitignore, so that I missed that dist directory hasn’t actually been pushed to the repo, sorry for that. I could simply rename it to ‘build/’ so that it will be pushed for sure. In that case will typescript still need to be replaced with javascript? Thanks for your time! |
Hey @rarous, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. I haven't check in-depth everything nor I tested if it works. Please look at the comments. Thanks.
actors/rozetka-count/src/main.js
Outdated
await crawler.run(); | ||
log.info('Crawl finished.'); | ||
|
||
await Apify.pushData({ OUTPUT: await getOrIncStatsValue() }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zpelechova This is how the dataset should look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zpelechova Since there were no details on the output of the count actor provided in specs, it would be great to know what the correct way of doing this is. I saw an example in another actor as { totalCount: value }. Is this the way it should be done?
actors/rozetka-daily/src/consts.js
Outdated
@@ -0,0 +1,42 @@ | |||
export const LABELS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second actor looks like most of the code is copy pasted from the first one. That will make it harder for maintenance. I would use a single folder for those and just changed the behaviour via input or env var. cc @rarous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it look like having separate actors for count functionality is not good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it could be multiple actors but definitely should not be multiple folders. But I will leave that up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, I would be the most efficient to do the count of the results and scraping the data in parallel, considering the crawling logic there has to be very similar. I just don't understand clearly how we should handle the output in case of having one actor for both purposes (though there are plenty ways how handle this, for example we could save the results to separate named datasets for count the the daily actors). But the reuse of the code is great, and I'm very glad that I'm allowed to do that here.
So, to sum up, which way should I choose — separate actors with some shared code or a single actor (some more specs about the output should be provided here then)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have already common
library for reusable code, but it should be for code that is reusable for all/most of actors.
This case should be IMHO handled just by type: "COUNT"
Input parameter. It will be one actor in more modes (we already have this for Black Friday scraping in older actors). It will be scheduled with different input parameters. This mode will just write to Dataset but skip the Keboola upload step.
Sorry, for inconvenience, we are still figuring out the process and shape - count functionality is new requirement for internal benchmarking of scraped data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @rarous 's approach, will be the simplest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I guess it is not properly explained in the docs, but I dont think it makes sense to have an actor four count which does the same as the main actor. The idea behind it is to double check the result, i.e. find and resonably use the numbers which tels how many items there are in each category, like with rozetka here:
@rarous @metalwarrior665 Hi guys! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding my points, it is done. But @rarous needs to check it works with the system and @zpelechova needs to check the output.
@vladyslav-n I tried run actor with
It is look like, this happen only when part of url is |
@janfiedler Hi! |
@janfiedler Hi! |
These are 3 new typescript actors for Hlidac Rozetka Project by Geniusee with edits after previous pull request attempt.