π― This task is part of my assignment focused on creating and configuring databases in different platforms, such as Google Cloud Platform's BigQuery, MongoDB Atlas, and Redis Cloud. Please click the blue text
to get directly to each database platform.
π My Experience and Reflections and Final Thoughts on the database platforms are based on my experience as a graduate student completing this assignment.
-
Link to the dataset used: Fake Dataset
-
Below is an overview to the healthcare dataset that will be uploaded in different platforms.
PatientID,Name,Age,Gender,DiagnosisCode,VisitDate,Hospital,TreatmentPlan,FollowUpDate 1,John Doe,45,M,M54.5,2024-09-10,Stony Brook Hospital,Physical Therapy,2024-09-20 2,Jane Smith,29,F,E11.9,2024-08-15,Stony Brook Hospital,Insulin,2024-09-15 3,Bob Johnson,65,M,I10,2024-10-01,Long Island Clinic,Hypertension Medication,2024-11-01 4,Alice Williams,50,F,J45.909,2024-07-22,Southampton Hospital,Bronchodilators,2024-08-22 5,Michael Brown,37,M,G43.909,2024-06-12,Stony Brook Hospital,Triptans, 6,Susan Davis,54,F,I25.10,2024-05-10,Stony Brook Hospital,Statins,2024-06-10
-
-
Dataset Explanation:
- PatientID: Unique identifier for each patient (integer).
- Name: Patient's full name (string).
- Age: Age of the patient (integer).
- Gender: Gender of the patient (string: M or F).
- DiagnosisCode: ICD-10 code representing the patientβs diagnosis (string).
- VisitDate: Date of the hospital visit (YYYY-MM-DD format).
- Hospital: Name of the hospital or clinic where the visit took place (string).
- TreatmentPlan: The prescribed treatment plan for the patient (string).
- FollowUpDate: Date for a follow-up visit, if applicable (YYYY-MM-DD format).
-
Navigated to BigQuery in the Google Cloud Console.
-
Created a new dataset named
raqs_no_sql
in BigQuery -
Uploaded the provided healthcare dataset (CSV file) into a new table named
module8_nosql_hw
within my newly created datasetraqs_no_sql
. -
Noted the connection details and the BigQuery editor interface.
-
I modified and explored the data. These are the queries that I used to retrieve data in GCP's BigQuery.
(1)
SELECT PatientID, Name, Age, DiagnosisCode, Hospital, TreatmentPlan FROM 'maraquel-soriano-hha504.raqs_no_sql.module8-nosql-hw WHERE Hospital = 'Stony Brook Hospital';
(2)
SELECT * FROM 'maraquel-soriano-hha504.raqs_no_sql module8-nosql-hw' WHERE Age >40
-
To get a better view of what I have explored, see the screenshots below.
-
To accomplish this task, I navigated to MongoDB Atlas and registered for the free tier using my
Stony Brook email
. -
Created a new database instance named
module8-nosql-hw
and with basic configuration settings. I performed this step on the website and using the MongoDB Compass application, which I downloaded to my laptop. -
Inserted the provided healthcare dataset into a collection named
patienthealthdata
, which I created when I first started the new database. I ensured each row was converted to a JSON document. -
This is the query that I used to retrieve simple patient data in MongoDB Compass.
{ "Hospital": "Stony Brook Hospital", "Age": { "$gt": 30 } }
-
I modified and explored the data and ran a simple query to retrieve patient data.
-
Using the MongoDB Compass application, which I downloaded to my laptop, I realized that the CSV file could be converted into a JSON format. I just needed to hover to the right-hand side of the screen and click the curly bracket icon. Each row containing patient data was converted to a JSON format, which looked like this for each patient:
{ "_id": { "$oid": "671e7083387ab9818bd8fe31" }, "PatientID": 2, "Name": "Jane Smith", "Age": 29, "Gender": "F", "DiagnosisCode": "E11.9", "VisitDate": { "$date": "2024-08-15T00:00:00.000Z" }, "Hospital": "Stony Brook Hospital", "TreatmentPlan": "Insulin", "FollowUpDate": { "$date": "2024-09-15T00:00:00.000Z" } }
-
To accomplish this task, I navigated to Redis Cloud and signed up for a free tier account using my
Stony Brook email
. -
Created a new Redis database instance named
raqs-nosql-hw
. -
Connected Redis to
Python
usingVisual Studio Code
. -
I explored the patient data in
Redis Insight
after inserting data into Redis using Python in Visual Studio Code. I installed Redis Insight on my laptop after I created a Redis database instance.
BIGQUERY | MONGODB | REDIS | |
---|---|---|---|
π Setup Process and Configuration |
(1) Setting up a Database: Creating a new database was straightforward through the Google Cloud Console. (2) Dataset Upload: I find that uploading a new dataset was efficiently done fast within the Google Cloud Console. It gave me the option to upload a CSV file from my local machine or through Google Cloud Storage, which I very find convenient. Also, the uploaded data is imported into tables within the dataset, it gave me the option to review it before completing the process. (3) BigQuery was able to auto-detect the schema, but it still gave me an option to modify it if needed. (4) BigQuery can handle large-scale data with auto-scaling. With my recent exposure to GCP, I found that BigQuery was able to manage this large amount of data effectively. |
(1) Creating a Cluster: Setting up a new cluster in MongoDB Atlas was intuitive, allowing me to choose both cloud providers (such as AWS, GCP, and Azure) and regions. (2) Dataset Upload: I utilized MongoDB Compass, a GUI tool for MongoDB, to upload a CSV file into a 'cluster' to create a healthcare dataset (collection) named 'patienthealthdata.' (3) Configuration: MongoDB allows setting database users/roles that ensures secured access to the uploaded healthcare dataset. |
(1) Creating a Database Instance: Like BigQuery and MongoDB, it seems straightforward to create a database instance with its configuration settings. (2) Connecting to Redis' Database Instance: I utilized Redis Insight to connect my Redis database instance to manage the data that will be uploaded. (3) Dataset Uploading or Insertion: I used a Python script in Visual Studio Code to insert the healthcare dataset into Redis, treating PatientID as the key. |
π Reflections on User-Interface and Usability | (1) GCP makes it easy for a beginner like me to write/run queries, explore the data, and manage tables. (2) With the help of my weekly assignments with Oracle, I have recent exposure to SQL. This made it easier for me to start and run queries within the healthcare dataset. (3) BigQuery's speed in making analytical queries on large dataset is quite impressive. This observation is based on my experience with a previous assignment. |
(1) I find both the MongoDB Atlas web interface and MongoDB Compass visually appealing, particularly Compass. Similar to BigQuery, MongoDB Atlas (and Compass) provides the ability to manage newly created databases, collections (tables), and user access. Initially, I was confused, but after exploring the interface, I realized it was just unfamiliar to me. I prefer using Compass, which I downloaded on my laptop. (2) I need to get used to MongoDB's query language. Having been exposed to SQL queries, which I find easier compared to MongoDB, I got confused even after following my professor's video recording. However, I used the simplest query. I know with time and practice my querying skills will likely improve, which will be part of my learning journey. |
(1) Similar to MongoDB Atlas and Compass, I find the Redis web interface and Redis Insight (downloaded on my laptop) visually appealing. However, I was confused while I was attempting to perform the Redis task. At some point while navigating, I almost gave up because I couldn't find what I was looking for, especially when I wanted to run simple queries to complete this assignment. Then, I realized that, like MongoDB, I am not quite familiar with Redis yet. I need more exposure and practice to get used to it. Also, I find that Redis is more complex, requiring more understanding and effort to get accustomed to the data and commands/languages to make this platform work for the datasets I will choose to import in the future compared to the other two platforms. This is probably just me. (2) I noticed that in Redis, the default data type for keys is "string." For instance, after I uploaded the healthcare dataset and explored the data, "PatientID" were stored as a string by default for each patient information. This is my first time using Redis, and perhaps it is designed this way to make it more simplified in managing databases. For a beginner like me, it can be confusing, but as I mentioned previously in MongoDB, with time and practice my skills will likely improve. |