Skip to content

Models

Pydantic models for OpenAIRE API entities and responses.

DataSourceResponse = ApiResponse[DataSource] module-attribute

Type alias for an API response containing a list of DataSource entities.

OrganizationResponse = ApiResponse[Organization] module-attribute

Type alias for an API response containing a list of Organization entities.

ProjectResponse = ApiResponse[Project] module-attribute

Type alias for an API response containing a list of Project entities.

ResearchProductResponse = ApiResponse[ResearchProduct] module-attribute

Type alias for an API response containing a list of ResearchProduct entities.

ApiResponse

Bases: BaseModel

Generic Pydantic model for standard OpenAIRE API list responses.

This model represents the common envelope structure for API responses that return a list of entities. It includes a header (metadata) and a results field containing the list of entities. It is generic over EntityType to allow specific entity types to be used in the results list.

Attributes:

Name Type Description
header Header

A Header object containing metadata about the response.

results list[EntityType] | None

An optional list of entities of type EntityType. A validator ensures this field is a list or None, handling potential API inconsistencies gracefully.

Source code in src/aireloom/models/base.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
class ApiResponse[EntityType: "BaseEntity"](BaseModel):
    """Generic Pydantic model for standard OpenAIRE API list responses.

    This model represents the common envelope structure for API responses that
    return a list of entities. It includes a `header` (metadata) and a `results`
    field containing the list of entities. It is generic over `EntityType` to
    allow specific entity types to be used in the `results` list.

    Attributes:
        header: A `Header` object containing metadata about the response.
        results: An optional list of entities of type `EntityType`. A validator
                 ensures this field is a list or None, handling potential API
                 inconsistencies gracefully.
    """

    header: Header
    # Results can sometimes be null/absent, sometimes an empty list
    results: list[EntityType] | None = None

    @field_validator("results", mode="before")
    @classmethod
    def handle_null_results(cls, v: Any) -> list[EntityType] | None:
        """Ensure 'results' is a list or None.

        Handles potential None or unexpected formats from the API.
        Logs a warning and returns an empty list for unexpected types.
        """
        if v is None:
            return None  # Explicitly return None if API sends null
        if isinstance(v, list):
            return v  # Already a list

        # Handle unexpected formats (e.g., dict wrappers like {'result': [...]})
        # or other non-list types by logging and returning an empty list.
        logger.warning(
            f"Unexpected format for 'results' field: {type(v)}. "
            f"Expected list or None, got {v!r}. Returning empty list."
        )
        return []

    model_config = ConfigDict(extra="allow")

handle_null_results(v) classmethod

Ensure 'results' is a list or None.

Handles potential None or unexpected formats from the API. Logs a warning and returns an empty list for unexpected types.

Source code in src/aireloom/models/base.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
@field_validator("results", mode="before")
@classmethod
def handle_null_results(cls, v: Any) -> list[EntityType] | None:
    """Ensure 'results' is a list or None.

    Handles potential None or unexpected formats from the API.
    Logs a warning and returns an empty list for unexpected types.
    """
    if v is None:
        return None  # Explicitly return None if API sends null
    if isinstance(v, list):
        return v  # Already a list

    # Handle unexpected formats (e.g., dict wrappers like {'result': [...]})
    # or other non-list types by logging and returning an empty list.
    logger.warning(
        f"Unexpected format for 'results' field: {type(v)}. "
        f"Expected list or None, got {v!r}. Returning empty list."
    )
    return []

BaseEntity

Bases: BaseModel

A base Pydantic model for OpenAIRE entities (e.g., publication, project).

This model provides a common foundation for all specific entity types, primarily by ensuring an id field is present, which is a common identifier across most OpenAIRE entities. It allows extra fields from the API to be captured without causing validation errors.

Attributes:

Name Type Description
id str

The unique identifier for the entity.

Source code in src/aireloom/models/base.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class BaseEntity(BaseModel):
    """A base Pydantic model for OpenAIRE entities (e.g., publication, project).

    This model provides a common foundation for all specific entity types,
    primarily by ensuring an `id` field is present, which is a common
    identifier across most OpenAIRE entities. It allows extra fields from the
    API to be captured without causing validation errors.

    Attributes:
        id: The unique identifier for the entity.
    """

    # Common identifier across most entities
    id: str

    model_config = ConfigDict(extra="allow")

ControlledField

Bases: BaseModel

Represents a field with a controlled vocabulary, typically including a scheme and a value.

This model is used for structured data elements where the value has a specific meaning defined by an associated scheme (e.g., a PID like DOI, or a subject classification from a specific thesaurus).

Attributes:

Name Type Description
scheme str | None

The scheme or system defining the context of the value (e.g., "doi", "orcid", "mesh").

value str | None

The actual value from the controlled vocabulary.

Source code in src/aireloom/models/data_source.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class ControlledField(BaseModel):
    """Represents a field with a controlled vocabulary, typically including a scheme and a value.

    This model is used for structured data elements where the value has a specific
    meaning defined by an associated scheme (e.g., a PID like DOI, or a subject
    classification from a specific thesaurus).

    Attributes:
        scheme: The scheme or system defining the context of the value (e.g., "doi", "orcid", "mesh").
        value: The actual value from the controlled vocabulary.
    """

    scheme: str | None = None
    value: str | None = None

    model_config = ConfigDict(extra="allow")

Country

Bases: BaseModel

Represents the country associated with an organization.

Attributes:

Name Type Description
code str | None

The ISO 3166-1 alpha-2 country code (e.g., "GR", "US").

label str | None

The human-readable name of the country (e.g., "Greece").

Source code in src/aireloom/models/organization.py
16
17
18
19
20
21
22
23
24
25
26
27
class Country(BaseModel):
    """Represents the country associated with an organization.

    Attributes:
        code: The ISO 3166-1 alpha-2 country code (e.g., "GR", "US").
        label: The human-readable name of the country (e.g., "Greece").
    """

    code: str | None = None
    label: str | None = None

    model_config = ConfigDict(extra="allow")

DataSource

Bases: BaseEntity

Model representing an OpenAIRE Data Source entity.

A data source in OpenAIRE can be a repository, journal, aggregator, etc. This model captures various metadata fields associated with a data source.

Source code in src/aireloom/models/data_source.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
class DataSource(BaseEntity):
    """Model representing an OpenAIRE Data Source entity.

    A data source in OpenAIRE can be a repository, journal, aggregator, etc.
    This model captures various metadata fields associated with a data source.
    """

    originalIds: list[str] | None = Field(default_factory=list)
    pids: list[ControlledField] | None = Field(default_factory=list)
    type: ControlledField | None = None
    openaireCompatibility: str | None = None
    officialName: str | None = None
    englishName: str | None = None
    websiteUrl: str | None = None
    logoUrl: str | None = None
    dateOfValidation: str | None = None
    description: str | None = None
    subjects: list[str] | None = Field(default_factory=list)
    languages: list[str] | None = Field(default_factory=list)
    contentTypes: list[str] | None = Field(default_factory=list)
    releaseStartDate: str | None = None
    releaseEndDate: str | None = None
    accessRights: AccessRightType | None = None
    uploadRights: AccessRightType | None = None
    databaseAccessRestriction: DatabaseRestrictionType | None = None
    dataUploadRestriction: str | None = None
    versioning: bool | None = None
    citationGuidelineUrl: str | None = None
    pidSystems: str | None = None
    certificates: str | None = None
    policies: list[str] | None = Field(default_factory=list)
    missionStatementUrl: str | None = None
    # Added based on documentation/analysis
    journal: Container | None = None

    model_config = ConfigDict(extra="allow")

Funding

Bases: BaseModel

Represents funding information for a project, including the source and stream.

Attributes:

Name Type Description
fundingStream FundingStream | None

A FundingStream object detailing the specific stream.

jurisdiction str | None

The jurisdiction associated with the funding (e.g., country code).

name str | None

The name of the funding body or organization.

shortName str | None

An optional short name or acronym for the funding body.

Source code in src/aireloom/models/project.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class Funding(BaseModel):
    """Represents funding information for a project, including the source and stream.

    Attributes:
        fundingStream: A `FundingStream` object detailing the specific stream.
        jurisdiction: The jurisdiction associated with the funding (e.g., country code).
        name: The name of the funding body or organization.
        shortName: An optional short name or acronym for the funding body.
    """

    fundingStream: FundingStream | None = None
    jurisdiction: str | None = None
    name: str | None = None
    shortName: str | None = None
    model_config = ConfigDict(extra="allow")

FundingStream

Bases: BaseModel

Represents details about a specific funding stream for a project.

Attributes:

Name Type Description
description str | None

A description of the funding stream.

id str | None

The unique identifier of the funding stream.

Source code in src/aireloom/models/project.py
18
19
20
21
22
23
24
25
26
27
28
class FundingStream(BaseModel):
    """Represents details about a specific funding stream for a project.

    Attributes:
        description: A description of the funding stream.
        id: The unique identifier of the funding stream.
    """

    description: str | None = None
    id: str | None = None
    model_config = ConfigDict(extra="allow")

Grant

Bases: BaseModel

Represents details about the grant amounts associated with a project.

Attributes:

Name Type Description
currency str | None

The currency code for the amounts (e.g., "EUR", "USD").

fundedAmount float | None

The amount of funding awarded.

totalCost float | None

The total cost of the project.

Source code in src/aireloom/models/project.py
48
49
50
51
52
53
54
55
56
57
58
59
60
class Grant(BaseModel):
    """Represents details about the grant amounts associated with a project.

    Attributes:
        currency: The currency code for the amounts (e.g., "EUR", "USD").
        fundedAmount: The amount of funding awarded.
        totalCost: The total cost of the project.
    """

    currency: str | None = None
    fundedAmount: float | None = None
    totalCost: float | None = None
    model_config = ConfigDict(extra="allow")

H2020Programme

Bases: BaseModel

Represents details about an H2020 programme related to a project.

Attributes:

Name Type Description
code str | None

The code of the H2020 programme.

description str | None

A description of the H2020 programme.

Source code in src/aireloom/models/project.py
63
64
65
66
67
68
69
70
71
72
73
class H2020Programme(BaseModel):
    """Represents details about an H2020 programme related to a project.

    Attributes:
        code: The code of the H2020 programme.
        description: A description of the H2020 programme.
    """

    code: str | None = None
    description: str | None = None
    model_config = ConfigDict(extra="allow")

Header

Bases: BaseModel

Represents the 'header' section commonly found in OpenAIRE API responses.

This model captures metadata about the API response, such as status, query time, total number of results found (numFound), pagination details like nextCursor, and page size. It includes validators to coerce numeric fields that might be returned as strings by the API.

Attributes:

Name Type Description
status str | None

Optional status message from the API.

code str | None

Optional status code from the API.

message str | None

Optional descriptive message from the API.

queryTime int | None

Time taken by the API to process the query, in milliseconds.

numFound int | None

Total number of results found matching the query criteria.

nextCursor str | HttpUrl | None

The cursor string to use for fetching the next page of results. Can be a string or an HttpUrl.

pageSize int | None

The number of results included in the current page.

Source code in src/aireloom/models/base.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class Header(BaseModel):
    """Represents the 'header' section commonly found in OpenAIRE API responses.

    This model captures metadata about the API response, such as status,
    query time, total number of results found (`numFound`), pagination details
    like `nextCursor`, and page size. It includes validators to coerce
    numeric fields that might be returned as strings by the API.

    Attributes:
        status: Optional status message from the API.
        code: Optional status code from the API.
        message: Optional descriptive message from the API.
        queryTime: Time taken by the API to process the query, in milliseconds.
        numFound: Total number of results found matching the query criteria.
        nextCursor: The cursor string to use for fetching the next page of results.
                    Can be a string or an HttpUrl.
        pageSize: The number of results included in the current page.
    """

    # Note: status, code, message are typically expected, but optional for robustness.
    status: str | None = None
    code: str | None = None
    message: str | None = None
    # total and count are often strings in the API response, needs validation/coercion
    queryTime: int | None = None
    numFound: int | None = None  # next/prev can be full URLs or just the cursor string
    nextCursor: str | HttpUrl | None = Field(default=None)  # API returns "nextCursor"
    pageSize: int | None = None

    @field_validator("queryTime", "numFound", "pageSize", mode="before")
    @classmethod
    def coerce_str_to_int(cls, v: Any) -> int | None:
        """Coerce string representations of numbers to integers, logging on failure."""
        if isinstance(v, str):
            try:
                return int(v)
            except (ValueError, TypeError):
                logger.warning(f"Could not coerce header value '{v}' to int.")
                return None
        # Allow integers through if they somehow bypass 'before' validation or API changes
        if isinstance(v, int):
            return v
        # Handle other unexpected types if necessary
        logger.warning(f"Unexpected type {type(v)} for header numeric value '{v}'.")
        return None

    model_config = ConfigDict(extra="allow")

coerce_str_to_int(v) classmethod

Coerce string representations of numbers to integers, logging on failure.

Source code in src/aireloom/models/base.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
@field_validator("queryTime", "numFound", "pageSize", mode="before")
@classmethod
def coerce_str_to_int(cls, v: Any) -> int | None:
    """Coerce string representations of numbers to integers, logging on failure."""
    if isinstance(v, str):
        try:
            return int(v)
        except (ValueError, TypeError):
            logger.warning(f"Could not coerce header value '{v}' to int.")
            return None
    # Allow integers through if they somehow bypass 'before' validation or API changes
    if isinstance(v, int):
        return v
    # Handle other unexpected types if necessary
    logger.warning(f"Unexpected type {type(v)} for header numeric value '{v}'.")
    return None

Organization

Bases: BaseEntity

Model representing an OpenAIRE Organization entity.

Captures details about an organization, including its names, website, country, and various persistent identifiers. Inherits the id field from BaseEntity.

Attributes:

Name Type Description
legalShortName str | None

The official short name or acronym of the organization.

legalName str | None

The full official legal name of the organization.

alternativeNames list[str] | None

A list of other known names for the organization.

websiteUrl str | None

The URL of the organization's official website.

country Country | None

A Country object representing the organization's country.

pids list[OrganizationPid] | None

A list of OrganizationPid objects representing various PIDs associated with the organization.

Source code in src/aireloom/models/organization.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class Organization(BaseEntity):
    """Model representing an OpenAIRE Organization entity.

    Captures details about an organization, including its names, website,
    country, and various persistent identifiers. Inherits the `id` field
    from `BaseEntity`.

    Attributes:
        legalShortName: The official short name or acronym of the organization.
        legalName: The full official legal name of the organization.
        alternativeNames: A list of other known names for the organization.
        websiteUrl: The URL of the organization's official website.
        country: A `Country` object representing the organization's country.
        pids: A list of `OrganizationPid` objects representing various PIDs
              associated with the organization.
    """

    # id is inherited from BaseEntity
    legalShortName: str | None = None
    legalName: str | None = None
    alternativeNames: list[str] | None = Field(default_factory=list)
    websiteUrl: str | None = None
    country: Country | None = None
    pids: list[OrganizationPid] | None = Field(default_factory=list)

    model_config = ConfigDict(extra="allow")

OrganizationPid

Bases: BaseModel

Represents a persistent identifier (PID) for an organization.

Attributes:

Name Type Description
scheme str | None

The scheme of the PID (e.g., "ror", "grid", "isni").

value str | None

The value of the PID.

Source code in src/aireloom/models/organization.py
30
31
32
33
34
35
36
37
38
39
40
41
class OrganizationPid(BaseModel):
    """Represents a persistent identifier (PID) for an organization.

    Attributes:
        scheme: The scheme of the PID (e.g., "ror", "grid", "isni").
        value: The value of the PID.
    """

    scheme: str | None = None
    value: str | None = None

    model_config = ConfigDict(extra="allow")

Project

Bases: BaseEntity

Model representing an OpenAIRE Project entity.

Captures comprehensive information about a research project, including its identifiers, title, funding, duration, and related metadata. Inherits the id field from BaseEntity.

Attributes:

Name Type Description
code str | None

The project code or grant number.

acronym str | None

The acronym of the project.

title str | None

The official title of the project.

callIdentifier str | None

Identifier for the funding call.

fundings list[Funding] | None

A list of Funding objects detailing the project's funding sources.

granted Grant | None

A Grant object with information about the awarded grant amounts.

h2020Programmes list[H2020Programme] | None

A list of H2020Programme objects if the project is part of H2020.

keywords list[str] | str | None

A list of keywords or a single string of keywords describing the project. A validator attempts to parse comma or semicolon-separated strings.

openAccessMandateForDataset bool | None

Boolean indicating if there's an open access mandate for datasets produced by the project.

openAccessMandateForPublications bool | None

Boolean indicating if there's an open access mandate for publications from the project.

startDate str | None

The start date of the project (typically "YYYY-MM-DD" string).

endDate str | None

The end date of the project (typically "YYYY-MM-DD" string).

subjects list[str] | None

A list of subject classifications for the project.

summary str | None

A summary or abstract of the project.

websiteUrl str | None

The URL of the project's official website.

Source code in src/aireloom/models/project.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class Project(BaseEntity):
    """Model representing an OpenAIRE Project entity.

    Captures comprehensive information about a research project, including its
    identifiers, title, funding, duration, and related metadata. Inherits the
    `id` field from `BaseEntity`.

    Attributes:
        code: The project code or grant number.
        acronym: The acronym of the project.
        title: The official title of the project.
        callIdentifier: Identifier for the funding call.
        fundings: A list of `Funding` objects detailing the project's funding sources.
        granted: A `Grant` object with information about the awarded grant amounts.
        h2020Programmes: A list of `H2020Programme` objects if the project is part of H2020.
        keywords: A list of keywords or a single string of keywords describing the project.
                  A validator attempts to parse comma or semicolon-separated strings.
        openAccessMandateForDataset: Boolean indicating if there's an open access
                                     mandate for datasets produced by the project.
        openAccessMandateForPublications: Boolean indicating if there's an open access
                                          mandate for publications from the project.
        startDate: The start date of the project (typically "YYYY-MM-DD" string).
        endDate: The end date of the project (typically "YYYY-MM-DD" string).
        subjects: A list of subject classifications for the project.
        summary: A summary or abstract of the project.
        websiteUrl: The URL of the project's official website.
    """

    # id is inherited from BaseEntity
    code: str | None = None
    acronym: str | None = None
    title: str | None = None
    callIdentifier: str | None = None
    fundings: list[Funding] | None = Field(default_factory=list)
    granted: Grant | None = None
    h2020Programmes: list[H2020Programme] | None = Field(default_factory=list)
    # Keywords might be a single string or a delimited string. Attempt parsing.
    keywords: list[str] | str | None = None
    openAccessMandateForDataset: bool | None = None
    openAccessMandateForPublications: bool | None = None
    # Dates are kept as string for safety due to potential missing parts or nulls.
    # Expected format is typically YYYY-MM-DD.
    startDate: str | None = None
    endDate: str | None = None
    subjects: list[str] | None = Field(default_factory=list)
    summary: str | None = None
    websiteUrl: str | None = None

    model_config = ConfigDict(extra="allow")

    @field_validator("keywords", mode="before")
    @classmethod
    def parse_keywords_string(cls, v: Any) -> list[str] | str | None:
        """Attempts to parse a keyword string into a list of strings.

        If the input `v` is a string, this validator tries to split it by common
        delimiters (comma, then semicolon). If splitting results in more than one
        part, a list of stripped parts is returned. Otherwise, the original string
        (or None if empty) is returned. If `v` is not a string (e.g., already a
        list or None), it's returned as is.

        Args:
            v: The value to parse, expected to be a string, list, or None.

        Returns:
            A list of strings if parsing was successful and yielded multiple keywords,
            the original string if no parsing occurred or yielded a single part,
            or None if the input string was empty.
        """
        if isinstance(v, str):
            # Prioritize comma, then semicolon
            delimiters = [",", ";"]
            for delimiter in delimiters:
                parts = [part.strip() for part in v.split(delimiter) if part.strip()]
                if len(parts) > 1:
                    return parts
            # If no split produced multiple parts, return the original string (or None if it was empty)
            return v if v else None
        # If not a string (e.g., already a list or None), return as is
        return v

parse_keywords_string(v) classmethod

Attempts to parse a keyword string into a list of strings.

If the input v is a string, this validator tries to split it by common delimiters (comma, then semicolon). If splitting results in more than one part, a list of stripped parts is returned. Otherwise, the original string (or None if empty) is returned. If v is not a string (e.g., already a list or None), it's returned as is.

Parameters:

Name Type Description Default
v Any

The value to parse, expected to be a string, list, or None.

required

Returns:

Type Description
list[str] | str | None

A list of strings if parsing was successful and yielded multiple keywords,

list[str] | str | None

the original string if no parsing occurred or yielded a single part,

list[str] | str | None

or None if the input string was empty.

Source code in src/aireloom/models/project.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
@field_validator("keywords", mode="before")
@classmethod
def parse_keywords_string(cls, v: Any) -> list[str] | str | None:
    """Attempts to parse a keyword string into a list of strings.

    If the input `v` is a string, this validator tries to split it by common
    delimiters (comma, then semicolon). If splitting results in more than one
    part, a list of stripped parts is returned. Otherwise, the original string
    (or None if empty) is returned. If `v` is not a string (e.g., already a
    list or None), it's returned as is.

    Args:
        v: The value to parse, expected to be a string, list, or None.

    Returns:
        A list of strings if parsing was successful and yielded multiple keywords,
        the original string if no parsing occurred or yielded a single part,
        or None if the input string was empty.
    """
    if isinstance(v, str):
        # Prioritize comma, then semicolon
        delimiters = [",", ";"]
        for delimiter in delimiters:
            parts = [part.strip() for part in v.split(delimiter) if part.strip()]
            if len(parts) > 1:
                return parts
        # If no split produced multiple parts, return the original string (or None if it was empty)
        return v if v else None
    # If not a string (e.g., already a list or None), return as is
    return v

ResearchProduct

Bases: BaseEntity

Model representing an OpenAIRE Research Product entity.

This is a central model in OpenAIRE, representing various outputs of research such as publications, datasets, software, or other types. It aggregates numerous metadata fields. Inherits id from BaseEntity.

Attributes:

Name Type Description
originalIds list[str] | None

A list of original identifiers for the research product.

pids list[Pid] | None

A list of Pid objects representing persistent identifiers.

type ResearchProductType | None

The ResearchProductType (e.g., "publication", "dataset").

title str | None

The main title of the research product.

authors list[Author] | None

A list of Author objects.

bestAccessRight BestAccessRight | None

A BestAccessRight object indicating the determined access status.

country ResultCountry | None

A ResultCountry object indicating the country associated with the product.

description str | None

A textual description or abstract of the research product.

publicationDate str | None

The publication date of the research product (YYYY-MM-DD string).

publisher str | None

The name of the publisher.

indicators Indicator | None

An Indicator object containing citation and usage metrics.

instances list[Instance] | None

A list of Instance objects representing different manifestations or versions of the research product.

language Language | None

A Language object for the primary language of the product.

subjects list[Subject] | None

A list of Subject objects.

container Container | None

A Container object if the product is part of a larger collection (e.g., a journal for an article).

geoLocation GeoLocation | None

A GeoLocation object, typically for datasets.

keywords list[str] | None

A list of keywords. A validator attempts to parse comma-separated strings.

journal Container | None

An alias or alternative field for container, often used for journal details. (Note: API might use 'container' or 'journal' field for similar info).

Source code in src/aireloom/models/research_product.py
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
class ResearchProduct(BaseEntity):
    """Model representing an OpenAIRE Research Product entity.

    This is a central model in OpenAIRE, representing various outputs of research
    such as publications, datasets, software, or other types. It aggregates
    numerous metadata fields. Inherits `id` from `BaseEntity`.

    Attributes:
        originalIds: A list of original identifiers for the research product.
        pids: A list of `Pid` objects representing persistent identifiers.
        type: The `ResearchProductType` (e.g., "publication", "dataset").
        title: The main title of the research product.
        authors: A list of `Author` objects.
        bestAccessRight: A `BestAccessRight` object indicating the determined access status.
        country: A `ResultCountry` object indicating the country associated with the product.
        description: A textual description or abstract of the research product.
        publicationDate: The publication date of the research product (YYYY-MM-DD string).
        publisher: The name of the publisher.
        indicators: An `Indicator` object containing citation and usage metrics.
        instances: A list of `Instance` objects representing different manifestations
                   or versions of the research product.
        language: A `Language` object for the primary language of the product.
        subjects: A list of `Subject` objects.
        container: A `Container` object if the product is part of a larger collection
                   (e.g., a journal for an article).
        geoLocation: A `GeoLocation` object, typically for datasets.
        keywords: A list of keywords. A validator attempts to parse comma-separated strings.
        journal: An alias or alternative field for `container`, often used for journal details.
                 (Note: API might use 'container' or 'journal' field for similar info).
    """

    # id is inherited from BaseEntity
    originalIds: list[str] | None = Field(default_factory=list)
    pids: list[Pid] | None = Field(default_factory=list)
    type: ResearchProductType | None = None
    title: str | None = None
    authors: list[Author] | None = Field(default_factory=list)
    bestAccessRight: BestAccessRight | None = None
    country: ResultCountry | None = None
    description: str | None = None
    publicationDate: str | None = None
    publisher: str | None = None
    indicators: Indicator | None = None
    instances: list[Instance] | None = Field(default_factory=list)
    language: Language | None = None
    subjects: list[Subject] | None = Field(default_factory=list)
    container: Container | None = None
    geoLocation: GeoLocation | None = None
    keywords: list[str] | None = Field(default_factory=list)
    journal: Container | None = None

    model_config = ConfigDict(extra="allow", populate_by_name=True)

    @field_validator("keywords", mode="before")
    @classmethod
    def split_keywords(cls, v: Any) -> list[str] | None:
        """Attempts to split a comma-separated string of keywords into a list.

        If the input `v` is a string, it's split by commas, and each part is stripped
        of whitespace. If `v` is None or not a string, it's returned as is (or None
        if the string was empty after stripping).

        Args:
            v: The value to parse, expected to be a string or None.

        Returns:
            A list of keyword strings, or None if input was None or empty.
        """
        if v is None:
            return None
        if isinstance(v, str):
            return [kw.strip() for kw in v.split(",") if kw.strip()]
        logger.warning(
            f"Unexpected value for ResearchProduct.keywords: {v}. Expected string or None."
        )
        return None  # Or raise ValueError if strictness is preferred

    @model_validator(mode="before")
    @classmethod
    def get_title_from_main_title(cls, data: Any) -> Any:
        """Populates the `title` field from `mainTitle` if `title` is not present.

        The OpenAIRE API sometimes uses `mainTitle` for the primary title. This
        validator ensures that the `title` field in the Pydantic model is populated
        using `mainTitle` if `title` itself is missing in the input data, effectively
        aliasing `mainTitle` to `title`.

        Args:
            data: The raw input data dictionary before validation.

        Returns:
            The (potentially modified) input data dictionary.
        """
        if isinstance(data, dict) and "mainTitle" in data:
            if (
                "title" not in data or data["title"] is None
            ):  # Ensure we don't overwrite an existing title
                data["title"] = data.pop("mainTitle")
            else:  # title exists, no need to pop mainTitle if it's just a duplicate
                data.pop("mainTitle", None)
        return data

get_title_from_main_title(data) classmethod

Populates the title field from mainTitle if title is not present.

The OpenAIRE API sometimes uses mainTitle for the primary title. This validator ensures that the title field in the Pydantic model is populated using mainTitle if title itself is missing in the input data, effectively aliasing mainTitle to title.

Parameters:

Name Type Description Default
data Any

The raw input data dictionary before validation.

required

Returns:

Type Description
Any

The (potentially modified) input data dictionary.

Source code in src/aireloom/models/research_product.py
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
@model_validator(mode="before")
@classmethod
def get_title_from_main_title(cls, data: Any) -> Any:
    """Populates the `title` field from `mainTitle` if `title` is not present.

    The OpenAIRE API sometimes uses `mainTitle` for the primary title. This
    validator ensures that the `title` field in the Pydantic model is populated
    using `mainTitle` if `title` itself is missing in the input data, effectively
    aliasing `mainTitle` to `title`.

    Args:
        data: The raw input data dictionary before validation.

    Returns:
        The (potentially modified) input data dictionary.
    """
    if isinstance(data, dict) and "mainTitle" in data:
        if (
            "title" not in data or data["title"] is None
        ):  # Ensure we don't overwrite an existing title
            data["title"] = data.pop("mainTitle")
        else:  # title exists, no need to pop mainTitle if it's just a duplicate
            data.pop("mainTitle", None)
    return data

split_keywords(v) classmethod

Attempts to split a comma-separated string of keywords into a list.

If the input v is a string, it's split by commas, and each part is stripped of whitespace. If v is None or not a string, it's returned as is (or None if the string was empty after stripping).

Parameters:

Name Type Description Default
v Any

The value to parse, expected to be a string or None.

required

Returns:

Type Description
list[str] | None

A list of keyword strings, or None if input was None or empty.

Source code in src/aireloom/models/research_product.py
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
@field_validator("keywords", mode="before")
@classmethod
def split_keywords(cls, v: Any) -> list[str] | None:
    """Attempts to split a comma-separated string of keywords into a list.

    If the input `v` is a string, it's split by commas, and each part is stripped
    of whitespace. If `v` is None or not a string, it's returned as is (or None
    if the string was empty after stripping).

    Args:
        v: The value to parse, expected to be a string or None.

    Returns:
        A list of keyword strings, or None if input was None or empty.
    """
    if v is None:
        return None
    if isinstance(v, str):
        return [kw.strip() for kw in v.split(",") if kw.strip()]
    logger.warning(
        f"Unexpected value for ResearchProduct.keywords: {v}. Expected string or None."
    )
    return None  # Or raise ValueError if strictness is preferred

ScholixCreator

Bases: BaseModel

Represents a creator (e.g., author, contributor) in the Scholix schema.

Attributes:

Name Type Description
name str | None

The name of the creator (aliased from "Name").

identifier list[ScholixIdentifier] | None

An optional list of ScholixIdentifier objects for the creator.

Source code in src/aireloom/models/scholix.py
44
45
46
47
48
49
50
51
52
53
54
55
class ScholixCreator(BaseModel):
    """Represents a creator (e.g., author, contributor) in the Scholix schema.

    Attributes:
        name: The name of the creator (aliased from "Name").
        identifier: An optional list of `ScholixIdentifier` objects for the creator.
    """

    name: str | None = Field(alias="Name", default=None)
    identifier: list[ScholixIdentifier] | None = Field(alias="Identifier", default=None)

    model_config = ConfigDict(populate_by_name=True, extra="allow")

ScholixEntity

Bases: BaseModel

Represents a scholarly entity (source or target) in a Scholix relationship.

Attributes:

Name Type Description
identifier list[ScholixIdentifier]

A list of ScholixIdentifier objects for the entity.

type ScholixEntityTypeName

The ScholixEntityTypeName (e.g., "publication", "dataset").

sub_type str | None

An optional subtype providing more specific classification.

title str | None

The title of the scholarly entity.

creator list[ScholixCreator] | None

A list of ScholixCreator objects.

publication_date str | None

The publication date of the entity (string format).

publisher list[ScholixPublisher] | None

A list of ScholixPublisher objects.

Source code in src/aireloom/models/scholix.py
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
class ScholixEntity(BaseModel):
    """Represents a scholarly entity (source or target) in a Scholix relationship.

    Attributes:
        identifier: A list of `ScholixIdentifier` objects for the entity.
        type: The `ScholixEntityTypeName` (e.g., "publication", "dataset").
        sub_type: An optional subtype providing more specific classification.
        title: The title of the scholarly entity.
        creator: A list of `ScholixCreator` objects.
        publication_date: The publication date of the entity (string format).
        publisher: A list of `ScholixPublisher` objects.
    """

    identifier: list[ScholixIdentifier] = Field(alias="Identifier")
    type: ScholixEntityTypeName = Field(alias="Type")
    sub_type: str | None = Field(alias="SubType", default=None)
    title: str | None = Field(alias="Title", default=None)
    creator: list[ScholixCreator] | None = Field(alias="Creator", default=None)
    publication_date: str | None = Field(alias="PublicationDate", default=None)
    publisher: list[ScholixPublisher] | None = Field(alias="Publisher", default=None)

    model_config = ConfigDict(populate_by_name=True, extra="allow")

ScholixIdentifier

Bases: BaseModel

Represents a persistent identifier within the Scholix schema.

Attributes:

Name Type Description
id_val str

The value of the identifier (aliased from "ID").

id_scheme str

The scheme of the identifier (aliased from "IDScheme", e.g., "doi", "url").

id_url HttpUrl | None

An optional resolvable URL for the identifier (aliased from "IDURL").

Source code in src/aireloom/models/scholix.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class ScholixIdentifier(BaseModel):
    """Represents a persistent identifier within the Scholix schema.

    Attributes:
        id_val: The value of the identifier (aliased from "ID").
        id_scheme: The scheme of the identifier (aliased from "IDScheme", e.g., "doi", "url").
        id_url: An optional resolvable URL for the identifier (aliased from "IDURL").
    """

    id_val: str = Field(alias="ID")
    id_scheme: str = Field(alias="IDScheme")
    id_url: HttpUrl | None = Field(alias="IDURL", default=None)

    model_config = ConfigDict(populate_by_name=True, extra="allow")

ScholixLinkProvider

Bases: BaseModel

Represents the provider of the Scholix link.

Attributes:

Name Type Description
name str

The name of the link provider (aliased from "Name").

identifier list[ScholixIdentifier] | None

An optional list of ScholixIdentifier objects for the provider.

Source code in src/aireloom/models/scholix.py
113
114
115
116
117
118
119
120
121
122
123
124
class ScholixLinkProvider(BaseModel):
    """Represents the provider of the Scholix link.

    Attributes:
        name: The name of the link provider (aliased from "Name").
        identifier: An optional list of `ScholixIdentifier` objects for the provider.
    """

    name: str = Field(alias="Name")
    identifier: list[ScholixIdentifier] | None = Field(alias="Identifier", default=None)

    model_config = ConfigDict(populate_by_name=True, extra="allow")

ScholixPublisher

Bases: BaseModel

Represents a publisher in the Scholix schema.

Attributes:

Name Type Description
name str

The name of the publisher (aliased from "Name").

identifier list[ScholixIdentifier] | None

An optional list of ScholixIdentifier objects for the publisher.

Source code in src/aireloom/models/scholix.py
58
59
60
61
62
63
64
65
66
67
68
69
class ScholixPublisher(BaseModel):
    """Represents a publisher in the Scholix schema.

    Attributes:
        name: The name of the publisher (aliased from "Name").
        identifier: An optional list of `ScholixIdentifier` objects for the publisher.
    """

    name: str = Field(alias="Name")
    identifier: list[ScholixIdentifier] | None = Field(alias="Identifier", default=None)

    model_config = ConfigDict(populate_by_name=True, extra="allow")

ScholixRelationship

Bases: BaseModel

Represents a single Scholix relationship link between two scholarly entities.

This is a core model in the Scholix schema, detailing the link provider, the type of relationship, the source entity, and the target entity.

Attributes:

Name Type Description
link_provider list[ScholixLinkProvider] | None

A list of ScholixLinkProvider objects detailing who provided the link.

relationship_type ScholixRelationshipType

A ScholixRelationshipType object describing the nature of the link.

source ScholixEntity

A ScholixEntity representing the source of the relationship.

target ScholixEntity

A ScholixEntity representing the target of the relationship.

link_publication_date datetime | None

The date when this link was published or made available.

license_url HttpUrl | None

An optional URL pointing to the license governing the use of this link information.

harvest_date str | None

The date when this link information was last harvested or updated.

Source code in src/aireloom/models/scholix.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
class ScholixRelationship(BaseModel):
    """Represents a single Scholix relationship link between two scholarly entities.

    This is a core model in the Scholix schema, detailing the link provider,
    the type of relationship, the source entity, and the target entity.

    Attributes:
        link_provider: A list of `ScholixLinkProvider` objects detailing who provided the link.
        relationship_type: A `ScholixRelationshipType` object describing the nature of the link.
        source: A `ScholixEntity` representing the source of the relationship.
        target: A `ScholixEntity` representing the target of the relationship.
        link_publication_date: The date when this link was published or made available.
        license_url: An optional URL pointing to the license governing the use of this link information.
        harvest_date: The date when this link information was last harvested or updated.
    """

    link_provider: list[ScholixLinkProvider] | None = Field(
        alias="LinkProvider", default=None
    )
    relationship_type: ScholixRelationshipType = Field(alias="RelationshipType")
    source: ScholixEntity = Field(alias="Source")
    target: ScholixEntity = Field(alias="Target")
    link_publication_date: datetime | None = Field(
        alias="LinkPublicationDate",
        default=None,
        description="Date the link was published.",
    )
    license_url: HttpUrl | None = Field(alias="LicenseURL", default=None)
    harvest_date: str | None = Field(alias="HarvestDate", default=None)

    model_config = ConfigDict(populate_by_name=True, extra="allow")

ScholixResponse

Bases: BaseModel

Response structure for the Scholexplorer Links endpoint.

Source code in src/aireloom/models/scholix.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
class ScholixResponse(BaseModel):
    """Response structure for the Scholexplorer Links endpoint."""

    current_page: int = Field(
        alias="currentPage", description="The current page number (0-indexed)."
    )
    total_links: int = Field(
        alias="totalLinks", description="Total number of links matching the query."
    )
    total_pages: int = Field(
        alias="totalPages", description="Total number of pages available."
    )
    result: list[ScholixRelationship] = Field(
        alias="result", description="List of Scholix relationship links."
    )

    model_config = ConfigDict(populate_by_name=True, extra="allow")