Import Diigo shared bookmarks and add curation/rendering pipeline#244
Open
dckc wants to merge 17 commits into
Open
Import Diigo shared bookmarks and add curation/rendering pipeline#244dckc wants to merge 17 commits into
dckc wants to merge 17 commits into
Conversation
- Rendering changes
- Switch monthly links section from markdown bullets to HTML list markup (ul/li)
- Render each bookmark in Diigo-like order:
- title link
- line break
- metadata line with date first, then tags
- line break
- optional description paragraph
- optional annotation quote blocks
- Omit no_tag from visible per-item metadata
- Remove textual labels from content blocks:
- no tags: wrapper
- no description: prefix
- no annotations: heading
- Annotation handling
- Render annotations as raw trusted HTML inside blockquote
- Render annotation comments as note-styled quote blocks
- Support functions
- Add helpers for HTML-focused output:
- annotation_html(...)
- inline_html_text(...)
- visible_tags(...)
- Test/quality updates
- Expand doctest assertions for new HTML output contract
- Keep regression coverage for malformed title suffixes and multiline text
- Re-run lint and doctest successfully before commit
Align frontmatter tag casing across pages by normalizing exact tag tokens: - html -> HTML - rchain -> RChain - uri -> URI This is isolated as its own commit and only changes tag-list metadata lines.
Regenerated pages/**/bookmarks-YYYY-MM.md using the updated bookmark renderer and current t.co cache-only linkification path.
Add a canonical mixed-case tag list and map tag parsing through it so tags like KC/HTML/RChain/URI retain intended capitalization while all other tags still normalize to lowercase.
Remove live URL expansion from monthly_bookmarks.py rendering. Linkification now resolves short URLs from the tco cache file only, adds --tco-cache CLI support, and threads cache data through post rendering.
Introduce a standalone t.co expansion utility that supports ndjson or rendered markdown inputs, updates a JSON cache, and offers --next N batching with throttled progress output.
Add a checker script that scans page metadata tags, imports the canonical mixed-case allowlist from monthly_bookmarks.py, and exits non-zero on mismatches.
Exclude bookmark roundup pages from homepage post selection and homepage tag counts, and apply list-view presentation changes so roundup entries are visibly de-emphasized elsewhere.
Record initial resolved t.co URL mappings for cache-only bookmark rendering and repeatable link expansion workflows.
Restore the local shared-bookmarks ndjson export with the AWS-signin credential token redacted to satisfy push protection.
Create a converter that initializes a fresh Zotero DB from official source schema SQL and imports shared Diigo bookmarks as webpage items with URL/title/tags plus annotation notes.
- Why: - Keep a local, reviewable source snapshot for DB bootstrap fallback. - What: - Add licensing notices. - Add only required schema SQL files. - Scope: - No runtime behavior changes yet; data conversion logic remains separate.
- Schema compatibility: - Resolve item type and field IDs by name from the target DB. - Stop using hard-coded numeric IDs that mismatched Zotero 8. - Bootstrap path: - Add --template-db support (defaulting to zero-items profile DB). - Copy template DB before import and assert it is empty (items=0). - Keep vendored SQL init as fallback when no template DB is found. - Import behavior: - Create imported parent items as webpage, not dictionaryEntry. - Preserve updated_at into dateModified/clientDateModified. - Keep Diigo annotations as child notes and include import/readlater tags.
- Input: - projects/diigo-bak/diigo-bookmarks-shared.ndjson - Baseline: - ~/Zotero zero-items/zotero.sqlite (Zotero 8 profile seed) - Output: - projects/diigo-bak/diigo-zotero-vendored.sqlite - Result: - Collection items imported as webpage type with Zotero 8-compatible schema rows.
e07b195 to
db575f4
Compare
ef5eb36 to
4630db1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR imports, curates, and publishes monthly shared-bookmark roundup pages, plus tooling and presentation updates to keep the workflow repeatable and safer.
Highlights:
expand_tco_urls.pywith cache + batching (--next N)Closes #243
Refs #32