Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form for entering corpus definitions (first draft) #982

Open
lukavdplas opened this issue Nov 25, 2022 · 0 comments · May be fixed by #1659
Open

Form for entering corpus definitions (first draft) #982

lukavdplas opened this issue Nov 25, 2022 · 0 comments · May be fixed by #1659
Assignees
Labels
corpus changes to corpus definitions or new corpora enhancement improvements to user functionality major major changes to functionality and/or the code base needs-mockup this suggestion could use a picture before it is implemented

Comments

@lukavdplas
Copy link
Contributor

The interface should have a menu to enter corpus definitions.

This form will be rather complicated. Our current concept is to realise this as a process with multiple steps, which guides the curator through the stages of adding a corpus. See the detailed proposal below.

Step 0: initialising a corpus

The curator opens the corpus creation menu. They are asked to provide a unique name for the corpus.

The backend adds the corpus to the SQL database, and links it to the curator's user ID

Step 1: defining data extraction

When the curator has confirmed their choice, they are taken to step 1 of the form, where they will fill in how the data should be extracted.

To start, they are required to upload an example CSV file. (The upload can have a size limit, it really should be a small sample).

The backend reads the provided CSV file and assembles a list of the available column names. The rest of the form now unlocks and the curator can start filling things in.

The most important step here is to add fields to the corpus. For each field, the curator fills in the name, type, and picks from a dropdown which column from the CSV should be used to extract the value. (Some advanced options may also be available here.)

Step 2: verifying extraction

The curator has confirmed their definition and moves to the next menu, where the backend will try to extract the data from the example file and the curator can see if everything works as expected.

The backend runs the documents() function of the corpus, resulting in the JSON data that would be sent to elasticsearch during indexing. Of course, any error messages or warnings will be shown here.

The extracted JSON is sent to the frontend and shown to the curator. They can review the JSON and download it to run tests, if they want.

At this point, the curator can go back to step 1 and make some more edits to the form. If there were no errors and the curator is satisfied, they can go to step 3.

Step 3: indexing

The curator is happy with their choice and wants to index the corpus. There are some options to realise the uploading and indexing, which has been added as a separate issue. It is possible to start with a draft version where this step is 'contact a developer and ask them to index the corpus'.

Step 4: interface settings

We now have an indexed corpus in I-analyzer. The corpus availability is still set to 'private', so the curator can see it but regular users can't.

The curator can view the corpus in the I-analyzer interface. They may have reason to clear the index and go back to step 2. If not, they use this step of the form to fine-tune the interface.

Here, the curator chooses interface settings that don't affect the elasticsearch index. For example, they can pick which fields are shown as filters, or choose the image and description for the corpus.

As will be explained in the form, the settings in this step can be changed at any time. Changing the settings in steps before this point would require re-indexing, but everything here can be changed and work immediately.

Step 5: publish the corpus

The corpus is ready and the curator sets its availability to 'public'. The corpus is now available to regular users. It probably makes sense if they can also pick which user roles have access at this point.

@lukavdplas lukavdplas added enhancement improvements to user functionality major major changes to functionality and/or the code base labels Nov 25, 2022
@lukavdplas lukavdplas added the corpus changes to corpus definitions or new corpora label Dec 8, 2022
@lukavdplas lukavdplas added the needs-mockup this suggestion could use a picture before it is implemented label Feb 27, 2024
@lukavdplas lukavdplas linked a pull request Dec 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
corpus changes to corpus definitions or new corpora enhancement improvements to user functionality major major changes to functionality and/or the code base needs-mockup this suggestion could use a picture before it is implemented
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants