We are currently working on models that can segment literary texts into coherent parts, which we call scenes.
After initially discussing and formally defining the task, we have organised a shared task at KONVENS 2021 to collect new ideas. At the moment, we are working on developing new models that can deal with this complex task.
The most recent publicly released dataset is the one from our Shared Task, which is available here.
Since the annotated texts are protected by copyright, we are not able to distribute them directly, but can only publish the annotations that can be merged with the full texts of the books. You are welcome to contact us for assistance with obtaining the full datasets.
If you are interested in this topic, feel free to contact Albin!