Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

682 dataset task list fetch data from all active resources #717

Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
c3d1ad6
update tasklist to show only relevant issues
GeorgeGoodall Dec 4, 2024
2d19b15
Merge commit 'b91dc88aca0bb6e03d76aa0d67ca70ec3e8ce7e6' into 682-data…
GeorgeGoodall Dec 9, 2024
0159e31
Merge commit '600fc91c6065299ffb567c7ce0e90f4f28ca5f47' into 682-data…
GeorgeGoodall Dec 9, 2024
0a9d508
fix bug for invalid param
GeorgeGoodall Dec 9, 2024
af39dc9
fix tests
GeorgeGoodall Dec 9, 2024
8c89de9
remove comments
GeorgeGoodall Dec 9, 2024
381095a
remove unneeded middleware for getting entity issues
GeorgeGoodall Dec 11, 2024
e79497a
remove and fix tests
GeorgeGoodall Dec 11, 2024
7e7c2cf
add js doc
GeorgeGoodall Dec 12, 2024
0dba092
add tests
GeorgeGoodall Dec 12, 2024
595658d
update test name
GeorgeGoodall Dec 12, 2024
70f35d8
Merge remote-tracking branch 'origin/main' into 682-dataset-task-list…
GeorgeGoodall Dec 12, 2024
b3240e1
Merge branch 'main' into 682-dataset-task-list-fetch-data-from-all-ac…
GeorgeGoodall-GovUk Dec 12, 2024
1aba7d3
update queries to ensure we are getting the correct issues
GeorgeGoodall Dec 12, 2024
19190c7
get entity counts for resources
GeorgeGoodall Dec 12, 2024
b4d4674
fix available datasets
GeorgeGoodall Dec 12, 2024
0ffd784
get entity_count on resource
GeorgeGoodall Dec 12, 2024
82eb4bf
look for entity null too
GeorgeGoodall Dec 12, 2024
99235f0
dont group issues as doing it in the query
GeorgeGoodall Dec 12, 2024
099f009
bug fixes
GeorgeGoodall Dec 13, 2024
e789a72
fix tests
GeorgeGoodall Dec 13, 2024
d74b089
Update src/middleware/common.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
c017706
Merge branch '682-dataset-task-list-fetch-data-from-all-active-resour…
GeorgeGoodall Dec 13, 2024
33c0938
Update src/middleware/datasetTaskList.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
87d819a
Update src/middleware/middleware.builders.js
GeorgeGoodall-GovUk Dec 13, 2024
d65bb83
Update src/middleware/middleware.builders.js
GeorgeGoodall-GovUk Dec 13, 2024
16f21b9
improved fetchEntitiesIssueCount
GeorgeGoodall Dec 13, 2024
eacadf1
Update src/middleware/middleware.builders.js
GeorgeGoodall-GovUk Dec 13, 2024
f6ec462
improved error handling
GeorgeGoodall Dec 13, 2024
de308c9
Merge commit 'a0b54a89a0180af991df6123d5eb6aa59a1e22de' into 682-data…
GeorgeGoodall Dec 13, 2024
5b034c4
Update src/middleware/datasetTaskList.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
5ffa55c
Update src/middleware/datasetTaskList.middleware.js
GeorgeGoodall-GovUk Dec 13, 2024
8d2183b
fix special issue types
GeorgeGoodall Dec 13, 2024
061a38c
fix test
GeorgeGoodall Dec 13, 2024
7de56d1
use + instead of concat in sql query
GeorgeGoodall Dec 17, 2024
81db693
Merge commit 'b261a0b8d7096f5953b6fa045805047cce30068b' into 682-data…
GeorgeGoodall Dec 17, 2024
f84569f
improve prepareEntry param validation
GeorgeGoodall Dec 17, 2024
0fa31bb
Merge commit '9d7c5c06325089a1f2aa03e9996b00f182a8e977' into 682-data…
GeorgeGoodall Jan 13, 2025
3b39eaa
improve error logging
GeorgeGoodall Jan 13, 2025
c3ddbe1
compare timestamps instead of potenitally incorect dates
GeorgeGoodall Jan 13, 2025
6b8d964
make sure to explictly test that rowcount is an integer
GeorgeGoodall Jan 13, 2025
8445b48
improve jsdoc
GeorgeGoodall Jan 14, 2025
03fc2ac
explicitly state collumns we are fetching
GeorgeGoodall Jan 14, 2025
6737606
improve error handling
GeorgeGoodall Jan 14, 2025
ef9697a
improve error handling
GeorgeGoodall Jan 14, 2025
7eb1eb8
fix: Validate issue structure in entry issue details middleware
GeorgeGoodall Jan 14, 2025
90d39ad
Update task title in datasetTaskList.middleware.test.js
GeorgeGoodall Jan 14, 2025
7d1cf92
Revert "Update task title in datasetTaskList.middleware.test.js"
GeorgeGoodall Jan 14, 2025
f0b7c7a
fix test
GeorgeGoodall Jan 14, 2025
b325713
ensure entry count is always a number
GeorgeGoodall Jan 15, 2025
6149abe
update logging
GeorgeGoodall Jan 15, 2025
c2a85cd
fail all for fetch from all datasets
GeorgeGoodall Jan 15, 2025
309f105
remove unused middleware functions
GeorgeGoodall Jan 15, 2025
17256ce
fixing some entity issues that have entry numbers (due to duplicate r…
GeorgeGoodall Jan 15, 2025
5a65eb8
Merge commit '9ba44827cabe1eb319f03e24fad27fe709e25834' into 682-data…
GeorgeGoodall Jan 16, 2025
edb4eea
fix unit tests
GeorgeGoodall Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 118 additions & 70 deletions src/middleware/common.middleware.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// @ts-check
import logger from '../utils/logger.js'
import { types } from '../utils/logging.js'
import { entryIssueGroups } from '../utils/utils.js'
import performanceDbApi from '../services/performanceDbApi.js'
import { fetchMany, fetchOne, FetchOneFallbackPolicy, FetchOptions, renderTemplate } from './middleware.builders.js'
import * as v from 'valibot'
Expand Down Expand Up @@ -195,7 +196,7 @@ export const createPaginationTemplateParams = (req, res, next) => {

export const fetchResources = fetchMany({
query: ({ req }) => `
SELECT r.end_date, r.entry_date, r.mime_type, r.resource, r.start_date, rle.endpoint_url, rle.licence, rle.status, rle.latest_log_entry_date, rle.endpoint_entry_date from resource r
SELECT DISTINCT rd.dataset, r.end_date, r.entry_date, r.mime_type, r.resource, r.start_date, rle.endpoint_url, rle.licence, rle.status, rle.latest_log_entry_date, rle.endpoint_entry_date from resource r
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved
LEFT JOIN resource_organisation ro ON ro.resource = r.resource
LEFT JOIN resource_dataset rd ON rd.resource = r.resource
LEFT JOIN reporting_latest_endpoints rle ON r.resource = rle.resource
Expand All @@ -206,6 +207,31 @@ export const fetchResources = fetchMany({
result: 'resources'
})

export const addEntityCountsToResources = async (req, res, next) => {
const { resources } = req

const promises = resources.map(resource => {
const query = `SELECT entry_count FROM dataset_resource WHERE resource = "${resource.resource}"`
return datasette.runQuery(query, resource.dataset)
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved
})

try {
const datasetResources = await Promise.all(promises).catch(error => {
logger.error('Failed to fetch dataset resources', { type: types.DataFetch, errorMessage: error.message, errorStack: error.stack })
throw error
})

req.resources = resources.map((resource, i) => {
return { ...resource, entry_count: datasetResources[i]?.formattedData[0]?.entry_count }
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved
})

next()
} catch (error) {
logger.error('Error in addEntityCountsToResources', { type: types.App, errorMessage: error.message, errorStack: error.stack })
next(error)
}
}

// Specification

export const fetchSpecification = fetchOne({
Expand Down Expand Up @@ -383,67 +409,25 @@ export const filterOutEntitiesWithoutIssues = (req, res, next) => {

const fetchEntityIssuesForFieldAndType = fetchMany({
query: ({ req, params }) => {
const issueTypeFilter = params.issue_type ? `AND issue_type = '${params.issue_type}'` : ''
const issueFieldFilter = params.issue_field ? `AND field = '${params.issue_field}'` : ''

const issueTypeClause = params.issue_type ? `AND i.issue_type = '${params.issue_type}'` : ''
const issueFieldClause = params.issue_field ? `AND field = '${params.issue_field}'` : ''
return `
SELECT e.entity, i.* FROM entity e
INNER JOIN issue i ON e.entity = i.entity
WHERE e.organisation_entity = ${req.orgInfo.entity}
${issueTypeFilter}
${issueFieldFilter}`
select i.issue_type, field, entity, message, severity, value
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource = '${req.resources[0].resource}'
${issueTypeClause}
AND it.responsibility = 'external'
AND it.severity = 'error'
${issueFieldClause}
AND i.dataset = '${req.params.dataset}'
AND entity != ''
`
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved
// LIMIT ${req.dataRange.pageLength} OFFSET ${req.dataRange.offset}
},
dataset: FetchOptions.fromParams,
result: 'issues'
})

export const FilterOutIssuesToMostRecent = (req, res, next) => {
const { resources, issues } = req

const issuesWithResources = issues.filter(issue => {
if (!issue.resource || !resources.find(resource => resource.resource === issue.resource)) {
logger.warn(`Missing resource on issue: ${JSON.stringify(issue)}`)
return false
}
return true
})

const groupedIssues = issuesWithResources.reduce((acc, current) => {
current.start_date = new Date(resources.find(resource => resource.resource === current.resource)?.start_date)
const { entity, field } = current
if (!acc[entity]) {
acc[entity] = {}
}
if (!acc[entity][field]) {
acc[entity][field] = []
}
acc[entity][field].push(current)

return acc
}, {})

const recentIssues = Object.fromEntries(Object.entries(groupedIssues).map(([entityName, issuesByEntity]) =>
[
entityName,
Object.fromEntries(Object.entries(issuesByEntity).map(([field, issues]) => [
field,
issues.sort((a, b) => b.start_date.getTime() - a.start_date.getTime())[0]
]))
]
))

const issuesFlattened = []

Object.values(recentIssues).forEach(issueByEntry => {
Object.values(issueByEntry).forEach(issueByField => {
issuesFlattened.push(issueByField)
})
})

req.issues = issuesFlattened
next()
}

export const removeIssuesThatHaveBeenFixed = async (req, res, next) => {
const { issues, resources } = req

Expand Down Expand Up @@ -518,18 +502,80 @@ export const addFieldMappingsToIssue = (req, res, next) => {
next()
}

export const addReferencesToIssues = (req, res, next) => {
const { issues, entities } = req
// We can only get the issues without entity from the latest resource as we have no way of knowing if those in previous resources have been fixed?
export const fetchEntryIssues = fetchMany({
query: ({ req, params }) => {
const issueTypeClause = params.issue_type ? `AND i.issue_type = '${params.issue_type}'` : ''
const issueFieldClause = params.issue_field ? `AND field = '${params.issue_field}'` : ''
return `
select i.issue_type, field, entity, message, severity, value, line_number
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource = '${req.resources[0].resource}'
${issueTypeClause}
AND it.responsibility = 'external'
AND it.severity = 'error'
AND i.dataset = '${req.params.dataset}'
${issueFieldClause}
AND (entity = '' OR entity is NULL OR i.issue_type in ('${entryIssueGroups.map(issue => issue.type).join("', '")}'))
LIMIT ${req.dataRange.pageLength} OFFSET ${req.dataRange.offset}
`
Comment on lines +507 to +522
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix SQL injection vulnerabilities in fetchEntryIssues.

The query is vulnerable to SQL injection in multiple places.

-    const issueTypeClause = params.issue_type ? `AND i.issue_type = '${params.issue_type}'` : ''
-    const issueFieldClause = params.issue_field ? `AND field = '${params.issue_field}'` : ''
+    const values = [req.resources[0].resource]
+    let paramCount = 2
+    const clauses = []
+    
+    if (params.issue_type) {
+      clauses.push(`AND i.issue_type = $${paramCount++}`)
+      values.push(params.issue_type)
+    }
+    if (params.issue_field) {
+      clauses.push(`AND field = $${paramCount++}`)
+      values.push(params.issue_field)
+    }
+    values.push(req.params.dataset)
+    values.push(entryIssueGroups.map(issue => issue.type))
+    values.push(req.dataRange.pageLength)
+    values.push(req.dataRange.offset)
+    
+    return {
+      text: `
         select i.issue_type, field, entity, message, severity, value, line_number
         from issue i
         LEFT JOIN issue_type it ON i.issue_type = it.issue_type
-        WHERE resource = '${req.resources[0].resource}'
-        ${issueTypeClause}
+        WHERE resource = $1
+        ${clauses.join(' ')}
         AND it.responsibility = 'external'
         AND it.severity = 'error'
-        AND i.dataset = '${req.params.dataset}'
-        ${issueFieldClause}
-        AND (entity = '' OR entity is NULL OR i.issue_type in ('${entryIssueGroups.map(issue => issue.type).join("', '")}'))
-        LIMIT ${req.dataRange.pageLength} OFFSET ${req.dataRange.offset}
+        AND i.dataset = $${paramCount++}
+        AND (entity = '' OR entity is NULL OR i.issue_type = ANY($${paramCount++}::text[]))
+        LIMIT $${paramCount++} OFFSET $${paramCount}
+        `,
+      values
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
query: ({ req, params }) => {
const issueTypeClause = params.issue_type ? `AND i.issue_type = '${params.issue_type}'` : ''
const issueFieldClause = params.issue_field ? `AND field = '${params.issue_field}'` : ''
return `
select i.issue_type, field, entity, message, severity, value, line_number
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource = '${req.resources[0].resource}'
${issueTypeClause}
AND it.responsibility = 'external'
AND it.severity = 'error'
AND i.dataset = '${req.params.dataset}'
${issueFieldClause}
AND (entity = '' OR entity is NULL OR i.issue_type in ('${entryIssueGroups.map(issue => issue.type).join("', '")}'))
LIMIT ${req.dataRange.pageLength} OFFSET ${req.dataRange.offset}
`
query: ({ req, params }) => {
const values = [req.resources[0].resource]
let paramCount = 2
const clauses = []
if (params.issue_type) {
clauses.push(`AND i.issue_type = $${paramCount++}`)
values.push(params.issue_type)
}
if (params.issue_field) {
clauses.push(`AND field = $${paramCount++}`)
values.push(params.issue_field)
}
values.push(req.params.dataset)
values.push(entryIssueGroups.map(issue => issue.type))
values.push(req.dataRange.pageLength)
values.push(req.dataRange.offset)
return {
text: `
select i.issue_type, field, entity, message, severity, value, line_number
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource = $1
${clauses.join(' ')}
AND it.responsibility = 'external'
AND it.severity = 'error'
AND i.dataset = $${paramCount++}
AND (entity = '' OR entity is NULL OR i.issue_type = ANY($${paramCount++}::text[]))
LIMIT $${paramCount++} OFFSET $${paramCount}
`,
values
}

},
result: 'entryIssues'
})

req.issues = issues.map(issue => {
const reference = entities.find(entity => entity.entity === issue.entity)?.reference
export const fetchEntityIssueCounts = fetchMany({
query: ({ req }) => {
const datasetClause = req.params.dataset ? `AND i.dataset = '${req.params.dataset}'` : ''
return `
select dataset, field, i.issue_type, COUNT(resource+line_number) as count
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource in ('${req.resources.map(resource => resource.resource).join("', '")}')
AND (entity != '' AND entity IS NOT NULL)
AND it.responsibility = 'external'
AND it.severity = 'error'
${datasetClause}
GROUP BY field, i.issue_type, dataset
`
},
result: 'entityIssueCounts'
})
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved

return { ...issue, reference }
export const getMostRecentResources = (resources) => {
const mostRecentResourcesMap = {}
resources.forEach(resource => {
const currentRecent = mostRecentResourcesMap[resource.dataset]
if (!currentRecent || new Date(currentRecent.start_date).getTime() < new Date(resource.start_date).getTime()) {
mostRecentResourcesMap[resource.dataset] = resource
}
})

next()
return Object.values(mostRecentResourcesMap)
}

export const fetchEntryIssueCounts = fetchMany({
query: ({ req }) => {
const datasetClause = req.params.dataset ? `AND i.dataset = '${req.params.dataset}'` : ''

const mostRecentResources = getMostRecentResources(req.resources)

const resourceIds = Object.values(mostRecentResources).map(resource => resource.resource)

return `
select dataset, field, i.issue_type, COUNT(resource + line_number) as count
from issue i
LEFT JOIN issue_type it ON i.issue_type = it.issue_type
WHERE resource in ('${resourceIds.join("', '")}')
AND (entity = '' OR entity is NULL)
AND it.responsibility = 'external'
AND it.severity = 'error'
${datasetClause}
GROUP BY field, i.issue_type, dataset
`
},
result: 'entryIssueCounts'
})
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved

/**
* This middleware chain is responsible for retrieving all entities for the given organisation, their latest issues,
* filtering out issues that have been fixed, and constructing the table params.
Expand All @@ -544,11 +590,11 @@ export const addReferencesToIssues = (req, res, next) => {
*/
export const processRelevantIssuesMiddlewares = [
fetchEntityIssuesForFieldAndType,
FilterOutIssuesToMostRecent,
removeIssuesThatHaveBeenFixed,
// arguably removeIssuesThatHaveBeenFixed should be s step however we have only currently found one organisation,
// however this step is very time consuming, so in order to progress im commenting it out for now
// removeIssuesThatHaveBeenFixed,
fetchFieldMappings,
addFieldMappingsToIssue,
addReferencesToIssues
addFieldMappingsToIssue
]

// Other
Expand Down Expand Up @@ -608,14 +654,16 @@ export const getSetDataRange = (pageLength) => (req, res, next) => {

export function getErrorSummaryItems (req, res, next) {
const { issue_type: issueType, issue_field: issueField } = req.params
const { issues, issueCount, entities, resources } = req
const { entryIssues, issues: entityIssues, issueCount, entities, resources } = req

const issues = entityIssues || entryIssues
GeorgeGoodall-GovUk marked this conversation as resolved.
Show resolved Hide resolved

const totalRecordCount = entities ? entities.length : resources[0].entry_count
const totalIssues = issueCount?.count || issues.length

const errorHeading = ''
const issueItems = [{
html: performanceDbApi.getTaskMessage({ issue_type: issueType, num_issues: totalIssues, entityCount: totalRecordCount, field: issueField }, true)
html: performanceDbApi.getTaskMessage({ issue_type: issueType, num_issues: totalIssues, rowCount: totalRecordCount, field: issueField }, true)
}]

req.errorSummary = {
Expand Down
Loading
Loading