Importing and exporting catalogue data
While it is possible to make all changes to the catalogue through the standard application interface this can sometimes be cumbersome if a large number of products are involved.
For example, if a new UNSPSC code attribute is added to the catalogue, then it will need to be added to almost if not all products. It would quickly become tedious to have to select each product in turn, select a new attribute, enter a code value, then update the product. Mistakes would inevitably be made through inattention, and the task would involve locking individual products for extended periods thereby preventing other administrators from changing them in other ways. Instead, it is much more efficient to perform these code additions as an import batch.
In general, to import data as a batch, the current data must first be exported. This process generates files that can be modified and then imported back in to make the desired changes. It is possible to produce the import files from scratch — the format is detailed below — but it is usually easier to perform a full export/update/import cycle.
Only catalogue superusers are allowed to run exports or imports.
Generating an export
There are 3 steps to creating a catalogue export:
- Select whether the export should contain attribute details about folders or about products.
While the file format generated does support a mixture of products and folders, an export created by an administrator can only produce details about one or the other. This limits any confusion as to what to edit by producing only data relevant to the task at hand, as tasks tend only to be product or folder-related, not both.
- Next, select where the data in the export should come from.
This entails choosing a number of hierarchies to include in the export. Doing this serves two purposes: first, the products or folders exported must be members of these hierarchies, and secondly, any hierarchy-specific attributes will be limited to including relevant values. If no hierarchies are chosen, the product will be exported without any location information or hierarchy-specific values.
Also in this step, the specific entries from the catalogue are chosen. When choosing a folder, it is possible to either include data about the folder itself or to also include everything under it.
- Finally, choose which attributes to include in the exported data.
For each exported catalogue entry, there will be at least one value line for each of these attributes. If an entry currently has no value for an attribute, a record will be included with a blank value, to make it easy for admin users to add one.
If no attributes are chosen, then the export will contain all attributes the product currently has, but will not contain any blank records in the case where no attribute value is present.
Once these steps have been followed the export can be generated. Depending on the complexity of the required data or the number of catalogue entries, it may take a long time to run. Because of this, a catalogue export cannot be run from an admin session that has unsaved changes.
Import file format
A catalogue import consists of a ZIP archive containing several files.
- datasource — this file is required. Its contents should describe the rest of the archive in human-readable form, they are simply logged for auditing purposes. If omitted, the entire archive is immediately rejected.
-
classification-<hierarchy>.xml —
This file describes the folder structure of a single hierarchy. Any number of these files can be included within the import.
<hierarchy name="catalogue"> <folder name="Computers"> <folder name="Desktop PCs" /> <folder name="Servers" /> <folder name="Laptops" /> </folder> </hierarchy>
Note that products are not included in this file format, only folders. Classification files can be omitted entirely if the hierarchy already exists, but take very little time to process so it is generally a good idea to include them. They can also contain partial hierarchy structures, as omitting a folder from the file does not cause it to be deleted. This can be especially useful when importing a file to create new products, as it allows the admin user to just provide the hierarchy structure required to position the products, without having to worry about areas of the catalogue they are not responsible for.
In addition to the <folder> tag there are a couple of tags used for modifying existing parts of the classification.
-
<rename>
This element simply renames a single folder in place. It cannot move it anywhere. For example:
… <folder name="Computers"> <folder name="Laptops"> <rename to="Notebooks" /> </folder> </folder> …
will rename a folder found at /Computers/Laptops so that it will instead be found at /Computers/Notebooks.
-
<move>
This element will permit renaming as above, but will also allow a folder to be moved to a new location in the hierarchy. This is slightly more complex, as the full path needs to be supplied for the new location, and therefore "/" characters in folder names need to be encoded as "|" (pipe), unlike with the <rename> tag. For example:
… <folder name="Computers"> <folder name="Laptops"> <move to="/Computers/Notebooks | Netbooks"/> </folder> </folder> …
In this case, the folder initially found at /Computers/Laptops will be effectively renamed to Notebooks / Netbooks, and would still be found under the /Computers folder.
In future versions of Mercatum, naming for hierarchies will be more restrictive in that folder names containing the "/" character will not be permitted. Folders will instead have both a hierarchy location identifier, and a translatable display name which will be able to contain any visible text. Until then, administrators must continue to encode these characters.
Note that there is no explicit way of deleting a folder from the hierarchy with a classification XML file. This is prone to data loss-causing errors, and is hence unsupported. Folders should be deleted manually through the administrator.
-
<rename>
-
attributes.csv — this file is only required if creating/modifying product data, or when modifying folder attributes. It is used to describe any attributes that should be changed by the import. If a value is encountered for any attribute not in this file, then an error will be generated and the line skipped, even if the attribute already existed.
The attribute descriptor is a 5-column CSV with the following fields:
name
The name field can either refer to a single attribute exactly, or it can be interpreted as a prefix by appending the "*" character. For example, in the case of specification attributes where all the attributes begin with the common prefix "mainspec-", the name field should contain the value "mainspec-*". Note that this does open up the possibility that an unwanted attribute could be created due to a typing mistake in the values file, should the attribute name still begin with "mainspec-".
type
The following types are supported. More details are available.
- string
The attribute may have a single value found within the accompanying value file. No special processing of that value is performed, it is stored as found.
- integer and number
The attribute may have a single value. It will be stored in the database as an integer or decimal number as appropriate, and so must be valid as such.
- price
The attribute may have a single value consisting of a number and a currency code separated by a space, e.g. 100.23 GBP. The currency can be omitted, if so the system's default currency will be assumed.
- boolean
The attribute may have a single valur that must be either "Y" or "N". It will be stored in the database as a boolean.
- string list
The attribute may have as many values as required, each one being a single record in the accompanying value file. For example, a product with 3 bullet points of feature text could be stored in the file as:
"id","attribute","value" ABC123,features,It's shiny ABC123,features,Does things ABC123,features,Makes toast
Note in the above example, the header line is included for reference only. In an actual import this line would not exist.
- product list
Similar to a string list, but each value in the accompanying file will be interpreted as a product code, creating a relationship between the node defined by the "id" column and the product identified by the value. If no such product exists, one will be created automatically but will not be associated with any particular hierarchies. Its data can then loaded at a later date — in fact it may even be included later on in the same import.
The type node list is also available. It should be used if the list attribute links to other folders instead of products. They can be used interchangeably as they are just synonyms, but appropriate use makes the attribute descriptor much more readable.
In an exported file, node list will always be used.
- classification
Attributes of this type are only valid on products, as they describe the location within a hierarchy that the product can be found. When used, the name of the attribute must be the name of the hierarchy, optionally (and recommended) prefixed with "location:".
Other than the name and type columns, the rest of the fields in the descriptor for a classification are irrelevant and are thus ignored.
The following example demonstrates how an import for a single product located in two hierarchies might look. First the attribute descriptor file:
"name","type","flags","origin","permitted values" location:catalogue,classification,,, location:some other hierarchy,classification,,, code,string,"unique,indexed",normal, description,string,"keyword",normal,
And now a sample set of values:
"id","attribute","value" ABC123,location:catalogue,/Computers/Laptops ABC123,location:some other hierarchy,/Laptops ABC123,code,ABC123 ABC123,description,A sample product
- string
flags
The value of this field is a comma-separated list of property names. Valid entries in this list are:
- indexed
- unique
- keyword
- required
- refineable
- enforced-restriction
- extends
These flags correspond to the attribute properties that can be set when creating new attributes. The enforced-restriction flag is only valid for value sets.
In some older versions of exported attributes.csv, it is possible to encounter the externally-indexed flag. This is just a legacy name for what is now refineable.
origin
This corresponds to the attribute's context-sensitive properties. The field value is a space-separated list of values from the following list:
- normal
- view
- inherited
- localised
permitted values
This field is only relevant for value sets, and should contain all the possible values for that set separated by the "|" (pipe) character.
N.B. take care when using value sets with import files. The contents of this field will be the only values permitted system-wide. If this should accidentally be left blank in a single import, it will have the effect of deleting every value for this attribute from the entire catalogue.
-
values.csv —
this file is likely to be the largest of any import, as it contains all the attribute value data.
It consists of a 3-column CSV with the following fields:
id
The "id" column within the value file should either contain a folder path or a product code. Folder paths should be encoded in a similar way to the <move> tag from the hierarchy structure XML document, i.e. replacing "/" characters in folder names with "|" (pipe).
When using a product code, it is a good idea to prefix it with the name of the code attribute being used. This is only necessary when using a system with multiple product code schemes, where it is acceptable to have two different products with the same code, one for each scheme. By providing scheme hinting information, the file can be imported without any ambiguity.
If it is omitted, the system will have a guess at what the user intended, based on their preferred product scheme. Take note, however, that the system will always favour modifying an existing product over creating a new one, so in the case of the two product schemes to create a product in the second scheme it is necessary to add the hinting to the id column.
A single import currently cannot contain a mixture of folder and product attribute changes. As generated exports will only contain attribute data for either folders or products, this is generally not a problem.
attribute
This column contains the name of an attribute from the descriptor file. If it does not resolve to a record in the descriptor, an error will be reported at the end, although the upload will continue to process beyond that record.
value
A single value for the attribute. In the case of list attributes, they may be multiple records for a single attribute, the first one encountered will clear the list. This makes it necessary to group all values for a list attribute into a contiguous block. It is a good idea to further group all attributes for a single product/folder together as well, as it decreases the time it will take to process the import.
To delete a value for an attribute, a record should be included with an empty value column. This also deletes the entire list in the case of a list attribute, therefore to delete a single value from a list requires that the rest of the entries still be included as separate records. This may seem like a lot of work when creating a file from scratch, but is simple to do when editing a file generated from a data export.
- defaults —
It is often useful to apply the same attribute changes to a large number of products in a batch, for example setting whether they are compatible with a new version of an operating system. In order to expedite this type of task, a catalogue import can include a special CSV file called defaults which contains a number of value records, each one will be applied to every product or folder mentioned in values.csv before any other value changes are performed.
The file is a 2-column CSV that behaves in the same way as the values file, but omits the "id" field. For example, given the following values.csv:
ABC123,code,ABC123 ABC123,new attribute,new value DEF456,code,DEF456 DEF456,new attribute,new value GHI789,code,GHI789 GHI789,new attribute,new value
all the lines for new attribute can be removed if the defaults contains the following line:
new attribute,new value
Note that the new attribute still needs to have a record in the attribute descriptor.
As an advanced feature on list attributes, it is possible to use the string "@preserve" as the value on the default record to prevent list attributes from being cleared. This means any encountered values for that attribute will be appended to the existing list, making it possible to use defaults in conjunction with lists without having to provide all the current contents. This technique can cause problems should an import fail part-way through processing, as it cannot be replayed as any new values will be appended a second time. Should this happen, it becomes necessary to generate a new export of the attribute and make the change in the verbose fashion.
- published.csv and unpublished.csv —
these two files will only work if the instance of Mercatum is running in "catalogue management" mode. They can be used to quickly publish or unpublish a large number of products to a server or set of servers.
Both files are 2-column CSVs containing the following fields:
id
This column works in the same way as the "id" column from the attribute values file.
server
This is the name of the external server, which can found on the sites section of the System Information tab. That section lists all the sites that Mercatum knows about, both local and remote, although in this case only the remote systems are relevant.
ABC123,site1 ABC123,site2
The contents of these files are interpreted as individual instructions, rather than being a definitive list. In the above example published.csv the system will ensure that the product ABC123 is published on both site1 and site2. However it could already be published to site3 and importing this file wouldn't change that. To guarantee that it ends up being unpublished anywhere else, records for all other remote sites would need to be included in unpublished.csv.
Either or both of these files are completely optional.