Content Importer
Purple Ingest Configuration Guide
56 min
technical guide for configuring purple ingest import settings overview purple ingest uses configuration files to control how content is imported, processed, and monitored configuration can be set at the team level (root) or overridden for specific paths (subfolders) all configuration files have to be uploaded before any content is uploaded if content files and configuration files are uploaded at the same time, it might happen, that the content files are handled first and the config is thus not applied configuration file types purple ingest supports three types of configuration files, each managing different aspects file purpose location fields config yml main configuration teamid/config yml (root only) all fields config properties path specific convert & collector patterns teamid/upload/\[path]/config properties convert settings, epub patterns, sbarchive patterns collector yml path specific collector strategy teamid/upload/\[path]/collector yml collector strategy and basic settings only recommendation use config yml for root level settings (emailaddress, alarm, default collector, upload) for path specific overrides, use collector yml (collector strategy) and config properties (convert settings, epub/sbarchive publication mappings and patterns) quick reference collector configuration collector collector yml config properties csv strategy csv additionalfiles (optional) convert settings only epub strategy epub additionalfiles (optional) convert settings + optional publication mappings, patterns name scheme strategy name scheme issuetime , productidprefix , issueprefixes additionalfiles (optional) convert settings only sb archive strategy sb archive maxfolderstoprocess additionalfiles (optional) convert settings + required publication mappings, patterns file locations and path hierarchy root configuration located at team level, applies as baseline for all imports teamid/ └── config yml # root configuration (path = null) path specific configuration located in subfolders, overrides specific settings for that path teamid/ ├── config yml # root configuration (all settings) └── upload/ ├── daily/ │ ├── config properties # daily specific convert settings │ └── collector yml # daily specific collector settings └── weekly/ ├── config properties # weekly specific convert settings └── collector yml # weekly specific collector settings important path specific folders can only use config properties or collector yml the config yml format is only supported at root level path resolution file uploaded to teamid/upload/issue pdf → uses root config ( teamid/config yml ) file uploaded to teamid/upload/daily/issue pdf → uses daily config if exists, otherwise root file uploaded to teamid/upload/weekly/issue pdf → uses weekly config if exists, otherwise root config yml (recommended) the main configuration file supporting all import settings full example \# enable or disable imports for this team enabled true \# email addresses for notifications emailaddress \ "admin\@example com" \ "team\@example com" \# collector configuration how to gather issue information collector \# strategy csv, epub, name scheme, or sb archive strategy csv \# additional files to include with each issue (optional) additionalfiles \ "readmodepackage zip" \ "supplementary pdf" \# issue time for name scheme collector (hh\ mm format, optional) issuetime "21 00" \# product id prefix for name scheme collector (optional) productidprefix "com example app" \# issue prefixes for name scheme collector (optional) \# maps filename prefixes to publication display names issueprefixes "magazine" "magazine name" \# sb archive collector settings (optional) maxfolderstoprocess 3 # maximum folders per batch \# note sbarchive publication mappings and patterns must be configured \# in config properties, not here see config properties section below \# upload configuration upload \# enable or disable uploads enabled true \# activation type preview, release, or off activate preview \# default issue access type free, paid, or locked (optional) defaultissueaccesstype free \# alarm configuration for file monitoring alarm \# enable or disable file alarms enabled true \# time before publication date when file is expected \# format "xh" (hours) or "xm" (minutes) start 1h \# duration after publication date to stop sending alarms \# default 24h end 24h \# convert configuration for content transformation convert \# global properties (applied to all transformers) global normalresourceoutputtype "pdf" normalresourceoutputdensity "100" \# pdf specific transformation settings pdf \# release version settings release preview "false" normalresourceoutputtype "pdf" pagesperstage "2" numberofpagesfromstart "1" pageimageheight null \# preview version settings preview preview "true" normalresourceoutputtype "pdf" numberofblurredpages "2" numberofpagesfromstart "1" pageimageheight null \# cover extraction settings cover coverdensity "50" coverfiletype "png" field reference enabled (boolean) controls whether imports are enabled for this team default true example enabled true emailaddress (list of strings) email addresses to receive notifications always on notifications job completion, import errors optional notifications file alarms (require alarm enabled true ) all recipients sent via bcc for privacy can only be configured at root level not supported in path specific configs example \["admin\@example com", "team\@example com"] collector (object) defines how issue information is collected common collector settings these settings apply to all collector types strategy (required) collection method csv epub name scheme sb archive additionalfiles (optional) extra files to include with each issue list of s3 object keys relative to team folder example \["readmodepackage zip", "supplementary pdf"] applies to all collectors collector types csv collector when to use files are uploaded after issues are pre created via csv import strategy csv how it works issues are created in database via csv import content files are uploaded collector matches uploaded files to existing database issues by filename and path configuration collector strategy csv additionalfiles # optional \ "readmodepackage zip" no additional settings required epub collector when to use epub publications packaged in zip files where metadata is extracted from epub files strategy epub how it works extracts epub files from zip archive parses package opf for dublin core metadata (dc\ title, dc\ date) maps epub title to publication name (with optional mapping via config properties) looks up publication in purple publish creates issue dynamically with pattern resolved properties configuration (collector yml) strategy epub additionalfiles # optional \ "cover jpg" optional pattern configuration (config properties) epub collector supports publication mapping and pattern customization via config properties in the same folder \# publication mapping publication=my magazine # static publication name (overrides dc\ title) publication magazine\ de=german edition # dynamic mapping (dc\ title → publication name) publication magazine\ en=english edition \# pattern configuration namepattern={d|dd mm yyyy|de} # issue name pattern iosproductidpattern=com company {1} {d|yyyymmdd|en} androidproductidpattern=com company {1} {d|yyyymmdd|en} webproductidpattern=com company {1} {d|yyyymmdd|en} issuenumberpattern={1} {2} issuealiaspattern={1} {d|yyyy mm dd|en} eveningpublishingtime=18 00 # evening publishing adjustment spaces in property keys must be escaped with a backslash (\\) pattern placeholders {1} epub title (dc\ title) {2} publication date in iso format (yyyy mm dd) {d|format|locale} publication date with custom format example {d|dd mm yyyy|de} → "15 01 2025" example {d|yyyymmdd|en} → "20250115" publication lookup priority publication $epubtitle (dynamic mapping) publication (static override) direct dc\ title from epub (no mapping) name scheme collector when to use automated imports where issue information is encoded in the filename strategy name scheme how it works parses filename to extract publication and date information uses filename prefix to determine publication extracts date from filename pattern creates issue dynamically configuration collector strategy name scheme issuetime "21 00" # publication time (hh\ mm format) productidprefix "com example app" # product id prefix issueprefixes # map filename prefixes to publications "magazine" "magazine name" "daily" "daily news" settings issuetime (optional) publication time format hh\ mm example "21 00" productidprefix (optional) product id prefix example "com example app" issueprefixes (optional) map filename prefixes to publication display names example {"magazine" "magazine name"} sb archive collector when to use batch processing of newspaper/magazine archives with mets xml metadata strategy sb archive how it works phase 1 (batch processing) triggered by do import file scans data/ folder for issue folders extracts edition code from mets xml maps edition to publication name creates issues in database creates zip files in pickup/ folder phase 2 (job creation) triggered by zip file in pickup/ looks up issue from database creates importjob for processing configuration (collector yml) strategy sb archive maxfolderstoprocess 5 # maximum issue folders per batch additionalfiles # optional \ "cover jpg" settings maxfolderstoprocess (optional) maximum issue folders to process per batch default 3 example 5 required pattern configuration (config properties) sb archive requires publication mapping and pattern configuration in config properties \# edition to publication mapping (required) publication ed1=daily news a publication ed2=daily news b publication ed3=weekly magazine \# static fallback (optional) publication=daily news a \# pattern configuration namepattern={1} {d|dd mm yyyy|de} # issue name pattern namepattern ed1=dna {d|dd mm yyyy|de} # edition specific override namepattern ed2=dnb {d|dd mm yyyy|de} \# product id patterns iosproductidpattern=sbarchive {2} {d|yyyymmdd|en} androidproductidpattern=sbarchive {2} {d|yyyymmdd|en} webproductidpattern=sbarchive {2} {d|yyyymmdd|en} \# issue properties issuenumberpattern={d|yyyymmdd|en} issuealiaspattern={d|dd mm yyyy|de} \# evening publishing adjustment eveningpublishingtime=21 00 pattern placeholders {1} publication name from purple publish {2} edition code from mets xml {d|format|locale} publication date with custom format example {d|dd mm yyyy|de} → "15 01 2025" example {d|yyyymmdd|en} → "20250115" example {d|eeee, dd mmmm yyyy|de} → "mittwoch, 15 januar 2025" edition mapping priority publication $edition (edition specific mapping) publication (fallback for unmapped editions) upload (object) controls upload behavior enabled (boolean) enable uploads for this path default true activate (string) activation mode preview publish to preview app release publish to release app off don't publish automatically default preview defaultissueaccesstype (string, optional) default access type free free access paid requires purchase locked hidden until unlocked default based on product ids alarm (object) optional file monitoring and alarm settings file alarms must be explicitly enabled by setting enabled true see for details can only be configured at root level not supported in path specific configs enabled (boolean) enable file alarms default false when false , no file alarms are sent (job completion and import error notifications still work) start (string) time before publication date to expect file format xh (hours) or xm (minutes) example "1h" (1 hour before publication) end (string) duration after publication date to stop alarms format xh (hours) or xm (minutes) default "24h" convert (object) content transformation settings global (map) properties applied to all transformers key value pairs of transformation settings example normalresourceoutputtype "pdf" pdf (object) pdf specific settings release settings for release version preview settings for preview version cover cover extraction settings config properties (path specific) properties file format for path specific settings used for convert settings content transformation properties epub collector patterns publication mappings and pattern configuration sbarchive collector patterns publication mappings and pattern configuration location must be placed in path specific folders under /upload/ teamid/upload/daily/config properties teamid/upload/weekly/config properties teamid/upload/config properties (root upload level) note cannot be placed at team root level ( teamid/config properties ) purpose by collector type collector config properties usage csv convert settings only epub convert settings + optional publication mappings and patterns (see epub collector section) name scheme convert settings only (all name scheme settings go in collector yml) sb archive convert settings + required publication mappings and patterns (see sb archive collector section) convert settings these apply to content transformation for all collector types \# pdf conversion settings multipagepdf=true normalresourceoutputtype=pdf numberofnonepreviewstages=5 maxpagesofpreviewversion=10 tocthumbnailheight=500 pagesperstage=2 normalresourceoutputdensity=150 \# additional conversion properties as needed collector specific patterns for epub and sbarchive collector patterns (publication mappings, name patterns, product ids, etc ), see epub collector section above for epub pattern configuration sb archive collector section above for sbarchive pattern configuration common pattern properties both epub and sbarchive collectors support these pattern properties in config properties property purpose example publication static publication name (fallback) publication=my magazine publication $key dynamic publication mapping publication ed1=daily news namepattern issue name pattern namepattern={d|dd mm yyyy|de} namepattern $key key specific name pattern override namepattern ed1= dna {d|dd mm yyyy|de} iosproductidpattern ios product id pattern iosproductidpattern= com app {1} {d|yyyymmdd|en} androidproductidpattern android product id pattern androidproductidpattern= com app {1} {d|yyyymmdd|en} webproductidpattern web product id pattern webproductidpattern= com app {1} {d|yyyymmdd|en} issuenumberpattern issue number pattern issuenumberpattern={d|yyyymmdd|en} issuealiaspattern issue alias pattern issuealiaspattern={d|dd mm yyyy|de} eveningpublishingtime evening edition adjustment eveningpublishingtime=21 00 important placeholder meanings differ between collectors epub {1} = epub title, {2} = publication date sbarchive {1} = publication name, {2} = edition code how it works when processed, all properties from config properties are loaded into the convert global map in importconfig convert global \# convert settings multipagepdf "true" normalresourceoutputtype "pdf" \# collector patterns (if using epub or sbarchive) publication ed1 "daily news a" namepattern "{1} {d|dd mm yyyy|de}" \# all other properties collectors read their patterns from convert global , while convert settings are used by the transformation pipeline collector yml (per path) yaml file for path specific collector settings example strategy name scheme additionalfiles \ "readmodepackage zip" issuetime "21 00" productidprefix "com example app" location must be placed in path specific folders teamid/upload/daily/collector yml teamid/upload/weekly/collector yml use case useful when different folders use different collector strategies root csv collector for standard imports special folder name scheme collector for automated imports how files work together partial updates each file type updates only its specific fields , leaving others untouched file updates leaves unchanged config yml all fields config properties convert global (includes convert settings and collector patterns) emailaddress, collector strategy, collector maxfolderstoprocess, collector additionalfiles, upload, alarm collector yml collector (strategy and basic settings only) emailaddress, convert, upload, alarm override behavior configuration is stored per team+path in mongodb when multiple files target the same path first file creates the config \# config yml creates full config emailaddress \["team\@example com"] alarm { enabled true } subsequent files update specific fields \# config properties adds convert settings normalresourceoutputtype=pdf result emailaddress \["team\@example com"] # from config yml alarm { enabled true } # from config yml convert global normalresourceoutputtype "pdf" # from config properties each file independently updates its fields \# collector yml updates collector strategy epub final result emailaddress \["team\@example com"] # from config yml alarm { enabled true } # from config yml convert global normalresourceoutputtype "pdf" # from config properties collector strategy epub # from collector yml deletion behavior when a config file is deleted if other fields exist only that field is removed \# before config with both collector and convert emailaddress \["team\@example com"] collector { strategy csv } convert { global { } } \# delete collector yml \# after convert remains emailaddress \["team\@example com"] convert { global { } } if no other fields exist entire config is deleted \# before config with only collector collector { strategy csv } \# delete collector yml \# after config deleted entirely \# (falls back to root or default) configuration examples example 1 simple team setup single configuration file for entire team teamid/ └── config yml enabled true emailaddress \ "admin\@example com" collector strategy csv alarm enabled true start 1h end 24h upload enabled true activate preview result all uploads use this configuration example 2 per path collector strategy different collector strategies for different folders teamid/ ├── config yml # root config └── upload/ ├── daily/ │ └── collector yml # daily uses name scheme └── weekly/ └── collector yml # weekly uses epub teamid/config yml emailaddress \ "team\@example com" collector strategy csv # default strategy alarm enabled true start 1h daily/collector yml strategy name scheme issuetime "21 00" weekly/collector yml strategy epub additionalfiles \ "cover jpg" result files in teamid/upload/ → csv collector (from root) files in teamid/upload/daily/ → name scheme collector (overridden) files in teamid/upload/weekly/ → epub collector (overridden) all paths use same emailaddress and alarm settings (from root) example 5 sbarchive batch processing archive batch processing with mets xml metadata teamid/ ├── config yml # root config └── upload/ └── archive batches/ ├── collector yml # sbarchive collector strategy └── config properties # publication mappings and patterns teamid/config yml emailaddress \ "archive team\@publisher com" collector strategy csv # default strategy alarm enabled true start 2h archive batches/collector yml strategy sb archive maxfolderstoprocess 5 additionalfiles \ "cover jpg" archive batches/config properties \# edition to publication mapping (required) publication ed1=daily news a publication ed2=daily news b publication ed3=weekly magazine \# fallback publication publication=daily news a \# issue name patterns namepattern={1} {d|dd mm yyyy|de} namepattern ed1=dna {d|dd mm yyyy|de} namepattern ed2=dnb {d|dd mm yyyy|de} \# product id patterns iosproductidpattern=archive {2} {d|yyyymmdd|en} androidproductidpattern=archive {2} {d|yyyymmdd|en} webproductidpattern=archive {2} {d|yyyymmdd|en} \# issue properties issuenumberpattern={d|yyyymmdd|en} issuealiaspattern={d|dd mm yyyy|de} \# evening publishing adjustment eveningpublishingtime=21 00 result root imports use csv collector archive batch imports ( teamid/upload/archive batches/ ) process up to 5 issue folders per batch map edition codes (ed1, ed2, ed3) to publications custom name patterns for different editions evening editions dated to previous day at 21 00 error notifications sent to configured email address additional cover jpg file included with each issue example 6 epub collector with publication mapping epub imports with dynamic publication mapping teamid/ ├── config yml # root config └── upload/ └── epub imports/ ├── collector yml # epub collector strategy └── config properties # publication mappings and patterns teamid/config yml emailaddress \ "epub team\@publisher com" collector strategy csv # default strategy alarm enabled false epub imports/collector yml strategy epub additionalfiles \ "cover jpg" \ "supplementary pdf" epub imports/config properties \# publication mapping publication magazine\ de=german edition publication magazine\ en=english edition publication magazine\ es=spanish edition \# static fallback for unmapped titles publication=default magazine \# pattern configuration namepattern={d|dd mm yyyy|de} iosproductidpattern=com publisher {1} {d|yyyymmdd|en} androidproductidpattern=com publisher {1} {d|yyyymmdd|en} webproductidpattern=com publisher {1} {d|yyyymmdd|en} issuenumberpattern={1} {2} issuealiaspattern={1} {d|yyyy mm dd|en} eveningpublishingtime=18 00 spaces in property keys must be escaped with a backslash (\\) result root imports use csv collector epub imports ( teamid/upload/epub imports/ ) epub with dc\ title "magazine de" → publication "german edition" epub with dc\ title "magazine en" → publication "english edition" epub with dc\ title "magazine es" → publication "spanish edition" other epub titles → publication "default magazine" custom product ids based on epub title and date evening editions dated to previous day at 18 00 additional cover and supplementary files included example 3 path specific convert settings using properties file for path specific convert settings teamid/ ├── config yml # root config └── upload/ └── magazines/ └── config properties # path specific convert settings teamid/config yml emailaddress \ "admin\@example com" collector strategy csv alarm enabled true start 2h magazines/config properties normalresourceoutputtype=jpg normalresourceoutputdensity=300 numberofnonepreviewstages=5 result root imports use config yml settings, no special convert settings magazine imports use config yml settings + convert properties from config properties example 4 complete multi path setup complex setup with different settings per publication teamid/ ├── config yml # root baseline └── upload/ ├── daily/ │ ├── collector yml # daily collector │ └── config properties # daily convert settings └── weekly/ └── collector yml # weekly collector root config yml emailaddress \ "admin\@publisher com" collector strategy csv # default collector alarm enabled true start 1h end 24h upload enabled true activate preview daily/collector yml strategy name scheme issuetime "06 00" productidprefix "com publisher daily" daily/config properties normalresourceoutputtype=pdf pagesperstage=2 weekly/collector yml strategy epub additionalfiles \ "readmodepackage zip" result daily imports ( teamid/upload/daily/ ) emailaddress \["admin\@publisher com"] (from root) collector name scheme with morning time (from daily/collector yml) convert pdf with 2 pages per stage (from daily/config properties) alarm 1h start, 24h end (from root) upload preview activation (from root) weekly imports ( teamid/upload/weekly/ ) emailaddress \["admin\@publisher com"] (from root cannot be overridden) collector epub with additional files (from weekly/collector yml) convert none (not specified) alarm 1h start, 24h end (from root cannot be overridden) upload preview activation (from root) root imports ( teamid/upload/ ) uses all root config yml settings best practices configuration strategy ✅ do use config yml for all settings when possible (simplest approach) use path specific configs only when truly needed document your configuration structure test configuration changes before deploying to production keep notification settings (emailaddress, alarm) in root config only ❌ don't mix formats unnecessarily (e g , both config yml and collector yml for same path) create deep path hierarchies (2 levels maximum) duplicate settings across multiple paths file organization simple teams (recommended) teamid/ └── config yml # single file, all settings multiple collector strategies teamid/ ├── config yml # base settings └── upload/ └── special/ └── collector yml # only override collector path specific convert settings teamid/ ├── config yml # root settings (alarm, emailaddress, default collector) └── upload/ └── magazines/ └── config properties # path specific convert settings testing configuration upload a config file to s3 check mongodb to verify settings were saved correctly upload a test file to trigger processing verify behavior matches expected configuration check notifications are sent to correct recipients common pitfalls pitfall "my path specific config isn't being used" cause file location doesn't match upload path solution ensure config file is in correct folder structure pitfall "i need different email addresses for different paths" reality not supported emailaddress can only be configured at root level solution use a shared email address or distribution list that forwards to appropriate recipients pitfall "convert settings disappeared after uploading collector yml" misunderstanding files don't overwrite each other reality each file manages different fields independently