This is a test project that uses IMDB data with the Ruby, Rails API only version for the backend. You can search with at least 4 characters (case-sensitive) and at least 3 character (case-sensitive) for movies and the casts respectively. You can also sign-up and add some movies, genres or the casts to your favorites. To do that you need to sign-in otherwise all you can do is just searching. The project uses MongoDB database for the main Branch, and I'll implement another branch with Postgresql and fix mentioned problems down there in that branch. You can reach the Frontend app from here.
- Ruby
- Rails API
- GraphQL
- MongoDB
- To get this project up and running locally, you must already have ruby installed on your computer.
- To run the local database server and connect the Ruby-driver to the server you need to install mongod on your computer. Please refer the link before continue.
I tried to host on Mongo Atlas but the free membership is limited to 500 MB and it is not enough for this data. Also I tried a lot but couldn't find a solution for mongoid.yml
to see the host option for remote server. I'll try in the future and will share the conclusion here.
Setup
- Clone this repository with
git clone https://github.com/eypsrcnuygr/film_shelf
using your terminal or command line. - Change to the project directory by entering
cd film_shelf
in the terminal - Open two different terminal more.
- At one terminal run
ulimit -n 4096
. This gives more memory before continueing the db creation processes. Here you can read about the maximum open files issue on StackOverFlow. - Next run
mongod --dbpath /Users/$your_user_name/Documents/local_datas/mongo_db/data --logpath /Users/$your_user_name/Documents/local_datas/mongo_db/logs/mongo.log
to start the mongo server. You can also change the paths, if you wish! - At the other terminal run
mongo
to start the mongo shell. It's not mandatory but I like to see and manipulate the database from the shell. - Then at the first terminal inside the API folder (/film_shelf), run
bundle
to install the necessary gems. - Wait for bundle to finish.
- Then run
rails s
to start the application server. - From the IMDB data please download the
name.basics.tsv.gz
andtitle.basics.tsv.gz
and unzipped them. - Then open another terminal and change the directory to the unzipped files directory and run
split -l 25000 title.basics.tsv
. This command will split the tsv file into 25.000 line and create these files. - On the same terminal then run
for i in *; do mv "$i" "$i.tsv"; done
. This command will add tsv file type as post-fix to the files. Collect and put them inside the public folder under the nametitle_tsvs
.
(/film_shelf/public/title_tsvs
) - On the same terminal and run
split -l 25000 name.basics.tsv
. This command will split the tsv file into 25.000 line and create these files. - On the same terminal then run
for i in *; do mv "$i" "$i.tsv"; done
. This command will add tsv file type as post-fix to the files. Collect and put them inside the public folder under the namename_tsvs
.
(/film_shelf/public/name_tsvs
) - You can change the terminal directory to the apps directory.
- As you can see there are four rake tasks added. We'll use 3 of them. In the terminal run
rake convert_titles
,rake convert_names
respectively. These will create json files from the tsv files and store them inside the public folder. We'll need them for our MongoDB database. - Then run
rake db_creator_custom
. This rake task has two options, I have used both of them. If you wish to have the time-stamps for your records don't change anything in the task, but it'll take more storage from your computer and takes more than 1 day to finish the task. If you don't care about the time-stamps switch the commented and uncommented parts. The task should look like this, you can copy from here and paste into thedb_creator.rake
file;
require 'csv'
require 'json'
desc 'Create a mongoDB from the json files'
task :db_creator_custom do
client = Mongo::Client.new(['127.0.0.1:27017'], database: 'film_shelf_development')
(0..332).each do |i|
titles = JSON.parse(File.read("#{Dir.pwd}/public/title_jsons/title#{i}.json"))
titles.map do |title_hash|
title_hash['genres'] = title_hash['genres'].split(',').to_a
end
client["titles#{i}"].insert_many(titles)
end
(0..450).each do |i|
names = JSON.parse(File.read("#{Dir.pwd}/public/name_jsons/name#{i}.json"))
names.map do |name_hash|
name_hash['knownForTitles'] = name_hash['knownForTitles'].split(',').to_a
end
client["names#{i}"].insert_many(names)
end
end
- If you choose to work with the second option it'll take around half an hour.
- After the task is finished you are all good and ready to go.
- Enjoy!
Repository Content
- It is a default Rails app with API only option, configured without Active Record.
- It also uses GraphQL query language to talk with the Frontend with just one endpoint.
👤 Eyüp Sercan UYGUR
- If I find time, I'm thinking to cover more edge cases and error handling on the Backend. Right now the tests are not well thought and could fail at some circumstances, I'll add more case and make the API stronger for failing conditions in the upcoming days.
- Better loggin-in and current_user tracing would be nice. For every session a seperated
logged_in
boolean could be the next step. - My relations and the app causes a lot of queries, so maybe embedding
titles
and thenames
collections could be an approach. However in my mind, there is asharding
option, which needs better indexing so I'm leaving it later. - For better test cases a seperated test database could be nice!
- User page takes too long to render, because it is making a lot of queries, again, a better relation with movies and actors can solve this issue!
- After adding two
genres
to your favorites, the app starts to offer some movies to you. I didn't use any known algorithm because for those algorithms the data needs to be related better and to show in the frontend don't want to make more queries to get the related field. So my approach is one way (from genre to movie) with two axis (movie-genre). Again, in the future if there will be a chance, I can work on the algorithm to make it more advanced and maybe even I can try a content based approach instead of a model based approach.
Contributions, issues and feature requests are welcome!
Give a ⭐️ if you like this project!
This project is MIT licensed.