#duraspace IRC Log


IRC Log for 2014-02-12

Timestamps are in GMT/BST.

[6:54] -cameron.freenode.net- *** Looking up your hostname...
[6:54] -cameron.freenode.net- *** Checking Ident
[6:54] -cameron.freenode.net- *** Found your hostname
[6:54] -cameron.freenode.net- *** No Ident response
[6:54] * DuraLogBot (~PircBot@atlas.duraspace.org) has joined #duraspace
[6:54] * Topic is '[Welcome to DuraSpace - This channel is logged - http://irclogs.duraspace.org/]'
[6:54] * Set by cwilper!ad579d86@gateway/web/freenode/ip. on Fri Oct 22 01:19:41 UTC 2010
[8:25] * misilot (~misilot@p-body.lib.fit.edu) Quit (Ping timeout: 250 seconds)
[8:25] * misilot (~misilot@p-body.lib.fit.edu) has joined #duraspace
[13:05] * mhwood (mwood@mhw.ulib.iupui.edu) has joined #duraspace
[14:01] * tdonohue (~tdonohue@c-50-179-112-246.hsd1.il.comcast.net) has joined #duraspace
[14:30] * misilot (~misilot@p-body.lib.fit.edu) Quit (Quit: Leaving)
[15:10] * misilot (~misilot@p-body.lib.fit.edu) has joined #duraspace
[15:56] * awoods (~awoods@c-67-165-245-76.hsd1.co.comcast.net) Quit (*.net *.split)
[16:03] * awoods (~awoods@c-67-165-245-76.hsd1.co.comcast.net) has joined #duraspace
[17:21] * PeterDietz (~peterdiet@dietz72m1.lib.ohio-state.edu) has joined #duraspace
[17:39] * hpottinger (~hpottinge@mu-161244.dhcp.missouri.edu) has joined #duraspace
[20:00] <tdonohue> Hi All, it's time for our weekly DSpace Developers Mtg. Agenda: https://wiki.duraspace.org/display/DSPACE/DevMtg+2014-02-12
[20:01] <kompewter> [ DevMtg 2014-02-12 - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DevMtg+2014-02-12
[20:02] <tdonohue> We'll kick off today with some 4.1 updates... We still have 14 tickets marked for 4.1 but still unresolved: https://jira.duraspace.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+DS+AND+fixVersion+%3D+%224.1%22+AND+resolution+%3D+Unresolved+ORDER+BY+due+ASC%2C+priority+DESC%2C+created+ASC&mode=hide
[20:02] <kompewter> [ Issue Navigator - DuraSpace JIRA ] - https://jira.duraspace.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+DS+AND+fixVersion+%3D+%224.1%22+AND+resolution+%3D+Unresolved+ORDER+BY+due+ASC%2C+priority+DESC%2C+created+ASC&mode=hide
[20:03] <tdonohue> So, it seems like we are at a decision point of whether to (A) continue to chip away at those tickets (and delay 4.1), or (B) do we finalize the 4.1 release date, and anything that isn't ready will be rescheduled for 4.2?
[20:04] <mhwood> Quite a few of these are "code review needed".
[20:04] <tdonohue> It's also worth noting there are several of those "4.1 unresolved" tickets that have "code review needed"...there's a PR...it just needs reviewing & merging
[20:04] <tdonohue> (mhwood & I just noticed the same thing)
[20:04] <hpottinger> can anybody spare some cycles to test some PRs?
[20:05] <mhwood> For me it's often a question of whether I understand what that part of DSpace *should* do.
[20:06] <tdonohue> I unfortunately don't have any cycles left this week...I need to quickly prep our application for GSoC due on Fri :), and I already had/have a busy week
[20:06] * aschweer (~schweer@schweer.its.waikato.ac.nz) has joined #duraspace
[20:06] <helix84> I see nothing that I could tackle. Perhaps I could look at spellchecking.
[20:07] <tdonohue> Another option here is to do a quick "sanity check" PR review in a group...either today, or at a later time.
[20:07] <hpottinger> since duralogbot is keeping track for us, here are the issue numbers that have PRs that need testing: DS-1536, DS-1823, DS-1834, DS-1898, DS-1352, DS-1821, DS-1848
[20:07] <helix84> actually, I could look at DS-1860
[20:07] <kompewter> [ https://jira.duraspace.org/browse/DS-1536 ] - [DS-1536] having a DOT in handle prefix causes identifier.uri to be cut off when being created - DuraSpace JIRA
[20:07] <kompewter> [ https://jira.duraspace.org/browse/DS-1823 ] - [DS-1823] Move dspace.url to build.properties as an independent variable (default value in dspace.cfg can cause issues) - DuraSpace JIRA
[20:08] <kompewter> [ https://jira.duraspace.org/browse/DS-1834 ] - [DS-1834] Collection content source harvesting test does not check sets properly - DuraSpace JIRA
[20:08] <kompewter> [ https://jira.duraspace.org/browse/DS-1898 ] - [DS-1898] OAI not always closing contexts - DuraSpace JIRA
[20:08] <kompewter> [ https://jira.duraspace.org/browse/DS-1352 ] - [DS-1352] Itemimport replace issue - DuraSpace JIRA
[20:08] <kompewter> [ https://jira.duraspace.org/browse/DS-1821 ] - [DS-1821] Internationalize the bitstream access icon alt text - DuraSpace JIRA
[20:08] <kompewter> [ https://jira.duraspace.org/browse/DS-1848 ] - [DS-1848] OAI harvest issues when starting from control panel/command line - DuraSpace JIRA
[20:08] <kompewter> [ https://jira.duraspace.org/browse/DS-1860 ] - [DS-1860] Community-list doesn&#39;t show all collections - DuraSpace JIRA
[20:08] <tdonohue> whoops...just noticed "1823" is in that list. That is supposed to be rescheduled for 5.0 :/ My bad. I'll fix that.
[20:09] <hpottinger> with 1823 out, that's 6 PRs
[20:10] <tdonohue> It *sounds* like 1536 has now been verified by several non-Committers. It seems trustworthy to me
[20:10] <tdonohue> wait, 1536 was already merged, by mhwood
[20:11] * kstamatis (25067aca@gateway/web/freenode/ip. has joined #duraspace
[20:11] <mhwood> Yes, I see. Close it?
[20:11] <helix84> yes, it's open because the SQL (which is necessary to fix the damage) still hasn't been tested
[20:11] <mhwood> There seems to be more discussion.
[20:11] <helix84> don't close it yet
[20:11] <mhwood> Ah, that's right.
[20:11] <tdonohue> what is the "damage"? It's unclear what can happen
[20:12] <helix84> invalid URI metadata
[20:12] <hpottinger> wait, SQL is traceable, we can validate it without running it, right?
[20:13] <helix84> hpottinger: I haven't actually seen the damaged metadata. I'm pretty sure it will work, but that's if my assumptions are correct.
[20:13] <hpottinger> If Ivan says it works, and it looks like it would work, that's good enough for me
[20:13] <tdonohue> So, how do we plan to test this SQL? I.e. what's the strategy for closing this ticket? This ticket to me kinda stands in the way of 4.1
[20:13] <helix84> we need to find someone affected by the bug
[20:15] <tdonohue> that's too vague though, and possibly no one has seen damage. Three people who were affected collaborated on the fix (Jose Blanco, Alex Graca and Hilton Gibson), and it sounds like none of them need the SQL?
[20:15] <helix84> Hilton said they fixed it by hand
[20:15] <helix84> anyway, you know what?
[20:16] <tdonohue> So, what I'm saying is that, while the SQL is important, it really shouldn't "block" closing this ticket.
[20:16] <helix84> the SQL will just go to the release notes and that can be edited at any time
[20:16] <tdonohue> helix84++
[20:16] <helix84> so let's just put it there with a warning and close it
[20:16] <tdonohue> sounds like a good solution, helix84
[20:16] <mhwood> Yes.
[20:16] <helix84> assigned to me
[20:17] <tdonohue> thanks, helix84
[20:17] <mhwood> Yes, thanks!
[20:17] <helix84> oh... what is the new location for release notes?
[20:17] <helix84> since it changed I couldn't figure out what it's supposed to be for minor releases
[20:19] <tdonohue> For 4.1 official release notes, we could always just create a new subpage named "4.1 Release Notes" under https://wiki.duraspace.org/display/DSDOC4x/Introduction
[20:19] <kompewter> [ Introduction - DSpace 4.x Documentation - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSDOC4x/Introduction
[20:19] <helix84> for now I put them at the old location: https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.1+Notes
[20:19] <kompewter> [ DSpace Release 4.1 Notes - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.1+Notes
[20:20] * kshepherd (~kim@wireless-nat-1.auckland.ac.nz) has joined #duraspace
[20:21] <tdonohue> In 4.0, we had two pages... Official Release notes: https://wiki.duraspace.org/display/DSDOC4x/Release+Notes and the "Unofficial Release Status page": https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.0+Status So, we should do something similar for 4.1
[20:21] <kompewter> [ Release Notes - DSpace 4.x Documentation - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSDOC4x/Release+Notes
[20:21] <kompewter> [ DSpace Release 4.0 Status - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.0+Status
[20:22] <hpottinger> OK, I've skimmed the short list of 4.1 issues with PRs (5 by my count), I think DS-1834 DSPR430 is one that my repository has noticed, I'll assign myself as a reviewer
[20:22] <kompewter> [ https://jira.duraspace.org/browse/DS-1834 ] - [DS-1834] Collection content source harvesting test does not check sets properly - DuraSpace JIRA
[20:22] <helix84> remind me - why di dwe move the page?
[20:23] <tdonohue> helix84: because "Release Notes" should really be part of official docs. What we were previously calling "Release Notes" (prior to 4.0) was really our own status page...it was notes we were keeping about the release process, but not a good summary of the actual release.
[20:25] <mhwood> Thank you hpottinger.
[20:25] <helix84> I don't like how the release notes and list of changes are now separate pages... makes it hard to see what changed.
[20:25] <helix84> https://wiki.duraspace.org/display/DSDOC4x/Release+Notes
[20:25] <helix84> https://wiki.duraspace.org/display/DSDOC4x/Changes+in+4.x
[20:25] <kompewter> [ Release Notes - DSpace 4.x Documentation - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSDOC4x/Release+Notes
[20:25] <kompewter> [ Changes in 4.x - DSpace 4.x Documentation - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSDOC4x/Changes+in+4.x
[20:25] <hpottinger> DS-1821 DSPR#451 looks like an easy one to test
[20:25] <kompewter> [ https://jira.duraspace.org/browse/DS-1821 ] - [DS-1821] Internationalize the bitstream access icon alt text - DuraSpace JIRA
[20:25] <kompewter> [ https://github.com/DSpace/DSpace/pull/451 ] - DS-1821 Internationalize the bitstream access icon alt text by helix84
[20:26] <tdonohue> So, with 4.1, it sounds like we still lack a "release date". Do we want to set one? Do we still feel it's worth waiting to review & possibly scheduling a time to review what's left together?
[20:26] <mhwood> Changes *is* linked from Release Notes.
[20:27] <hpottinger> How Feb 28, it's a Friday?
[20:27] <helix84> mhwood: that's my beef with it :)
[20:27] <tdonohue> helix84: We keep the changes page separate so that it can be at the *END* of the documentation. Sticking it into the beginning of the docs would mean the first 1/2 of the PDF would be links to JIRA tickets...you'd have to page forever to get to the "Installation Instructions"
[20:28] <mhwood> 28th sounds better than 14th, which was discussed. :-/
[20:28] <tdonohue> helix84: So, there's a balance...you want to explain what is new in the release (Release Notes) without overwhelming people with info (Changes in 4.x)
[20:28] <helix84> tdonohue: what about embeding it?
[20:29] <helix84> I know it's big for a .0 release, but pretty vital for a .x release. Anyway, we don't redo the PDF for minir releases, do we?
[20:29] <mhwood> I think I agree that release notes should be brief. Ours may already be too long.
[20:29] <tdonohue> helix84: I still disagree...while developers may be interested in the long list of JIRA tickets (Changes in 4.x), most people are not.
[20:29] <tdonohue> We always redo the PDF for minor releases (at least when we remember to)
[20:30] <helix84> let me put it this way - as a user, what I'm looking for in release notes is if it fixes a bug that's affecting me
[20:30] <mhwood> What I look for is "what do I have to watch out for when upgrading?"
[20:30] <tdonohue> helix84: check other software release notes. It tends to NOT include a list of every fix. It might highlight some major fixes...but they nearly always include a link saying "For a list of all fixes, please see..."
[20:31] <mhwood> Yes, I wouldn't want to wade through hundreds of patches looking for the "gotcha"s.
[20:32] <tdonohue> In any case, back to the 4.1 release schedule. I heard Feb 28 proposed
[20:33] <helix84> 3.1 = 3+14 issues; 3.2 = 1+12 issues fixed - very reasonable
[20:33] <helix84> sorry for the distraction. it's just that we changed away from something that I thought was very reasonable.
[20:34] <mhwood> 28th sounds reasonable.
[20:36] <tdonohue> helix84: not sure I understand what we changed. Technically, our Official Docs have always been structured the same...even in 3.0, we only had a "Summary" of the release at the beginning (back then we called it the "Preface"): https://wiki.duraspace.org/display/DSDOC3x/Preface
[20:36] <kompewter> [ Preface - DSpace 3.x Documentation - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSDOC3x/Preface
[20:36] <PeterDietz> hi all. Was in a real-life meeting
[20:36] <tdonohue> helix84: In a sense, all we did in 4.0 was to *rename* "Preface" to "Release Notes"
[20:36] <tdonohue> mhwood: sounds great. I think 28th also sounds reasonable for 4.1
[20:37] <mhwood> That's three for the 28th, and two are RT members.
[20:38] <tdonohue> Sounds like 28th is the date then, assuming no major objections from anyone else.
[20:38] <hpottinger> 2 RT members, and an honorary RT member
[20:38] <mhwood> 28th is now on my calendar.
[20:38] <tdonohue> Next steps of course would be to do some final reviews of the unresolved 4.1 tickets, and see what else we can resolve before the 28th. I suggest we use next week's JIRA Review Hour to review 4.1 unresolved tickets
[20:39] <tdonohue> If anyone is able to find time to chip in on 4.1 unresolved tickets, please do so ... we can use some help. It seems like we just need a final push to get 4.1 wrapped up
[20:39] <mhwood> We can do that. Meanwhile I need to think about how to move the remaining tickets forward.
[20:40] <hpottinger> I mentioned the 4.1 issues that need code review earlier, DSPR#451 looks like an easy one to sign off on
[20:40] <kompewter> [ https://github.com/DSpace/DSpace/pull/451 ] - DS-1821 Internationalize the bitstream access icon alt text by helix84
[20:41] <helix84> just noting that I will most likely miss Jira reviews until DST
[20:41] <mhwood> 451 is merged.
[20:41] <hpottinger> see? easy peasy :-)
[20:42] <tdonohue> So, it sounds like we have a general plan for 4.1, we just need more volunteers to chip in & help us wrap up the release. Should we put out a request to email list(s) for more help in 4.1? (-commit or -devel)?
[20:42] <helix84> kids these days... none reads any warning signs...
[20:42] <hpottinger> helix84, can you close DS-1821?
[20:42] <kompewter> [ https://jira.duraspace.org/browse/DS-1821 ] - [DS-1821] Internationalize the bitstream access icon alt text - DuraSpace JIRA
[20:43] <hpottinger> +1 call for volunteers, -devel and -commit
[20:43] <mhwood> 1821 was waiting on translations to percolate through Central.
[20:43] <mhwood> Yes, it sounds like we need to find people who can test these PRs.
[20:44] <tdonohue> RE: 4.1 call for volunteers, I'd recommend one of the RT send out that email... I need to make sure I get our GSoC application wrapped up.
[20:44] <hpottinger> I'll send the note, if there are no objections?
[20:44] <mhwood> None here.
[20:44] <tdonohue> go for it. It need not be complex, just send it :)
[20:44] <mhwood> Thanks.
[20:45] <tdonohue> Ok, in the essence of time, I'm gonna move along on our agenda (sounds like we have a 4.1 plan now)
[20:45] <tdonohue> Next up was Google Summer of Code. Cause of the flurry of activity today by the Committers, DuraSpace *WILL* apply for GSoC 2014. I'll get a draft going ASAP.
[20:46] <hpottinger> Yay (for the application, not for more work for tdonohue)
[20:46] <tdonohue> Please continue to add GSoC ideas / enhance ideas here: https://wiki.duraspace.org/display/GSOC/DSpace+Summer+of+Code+Ideas (The GSoC Ideas page is the *MOST* important part of the application)
[20:46] <kompewter> [ DSpace Summer of Code Ideas - Google Summer of Code - DuraSpace Wiki ] - https://wiki.duraspace.org/display/GSOC/DSpace+Summer+of+Code+Ideas
[20:47] <hpottinger> I already said this in e-mail, but if anyone sees a project they'd like to get picked up by GSoC, sign up as a mentor or co-mentor
[20:47] <tdonohue> But, beyond those two mentions, I don't have anything else to mention regarding GSoC. Once I have a draft application ready, I'll try to forward it on to -commit for any last minute feedback from Committers. The application is due on Fri
[20:47] <hpottinger> and sincere thanks to aschweer for doing so with a couple of projects
[20:47] <mhwood> helix84, we chatted earlier about metadata-for-all?
[20:48] <aschweer> hpottinger: no worries :)
[20:49] <helix84> mhwood: yes, sounds like a good idea to offload the UI work you don't like after doing the changes in the API
[20:50] <tdonohue> So, is there anything else to discuss regarding GSoC? Or shall we move on (for the final 10mins) to kshepherd's / hpottinger's topic of chatting about potentially adding "Streaming" to DSpace 5.0?
[20:50] <mhwood> metadata stuff *is* regarding GSoC.
[20:50] <hpottinger> pst, kshepherd, wake up :-)
[20:51] <kshepherd> i'm here ;)
[20:51] <mhwood> Though now I am struggling to recall what the UI concerns would be.
[20:51] <helix84> What I'm most curious is what is needed for streaming (server-side)? Isn't Accept-Range enough?
[20:51] <tdonohue> gotcha, mhwood. So, do you want to talk metadata stuff? I wasn't clear if you want feedback on this now, or if we talk about it via email?
[20:53] <hpottinger> helix84: accept-range *ought* to be enough, however, it doesn't quite work with PDF readers
[20:53] * aschweer (~schweer@schweer.its.waikato.ac.nz) Quit (Quit: leaving)
[20:53] <helix84> so what's missing?
[20:54] <hpottinger> if you uncomment the accept-range code, you get broken PDF downloads
[20:54] <mhwood> Where we stand on MD4All: Mark Diggory did some work on this back around PR#12 (now closed, unmerged). Having forgotten that, I recently did similar stuff. It's perhaps halfway done. Metadata support is moved from Item to DSpaceObject. Nothing is yet done to migrate object record fields to the MetadataValue table.
[20:55] <kshepherd> various versions of adobe cope differently with multipart pdfs, from what i've notice (in non-dspace environments)
[20:56] <helix84> adobe as the browser plugin?
[20:56] <hpottinger> correct, helix84
[20:56] <kshepherd> yeh
[20:56] <hpottinger> but also the MacOS "preview" app
[20:57] <kshepherd> possibly in combination with 1 or more reverse proxy
[20:57] <helix84> and this discussion is about PDFs only or also video?
[20:57] <kshepherd> i can't remember the exact issue.. either a chunk gets interpreted as "eof", or the offsets get mixed up or something
[20:57] <kshepherd> sorry, i thought we were talking about multipart PDF
[20:58] <tdonohue> mhwood: my one worry about MD4All + GSoC is whether it can be "scoped" small enough for a Summer project. It still always sounds pretty large to me...it might be possible to do in a Summer though if we only had the student handle part of it (either API level or UI level changes).
[20:58] <helix84> no, I'm asking
[20:58] <hpottinger> discussion on the mail list about "pseudo streaming" http://dspace.2283337.n4.nabble.com/Psuedo-video-streaming-with-item-attachment-and-seeking-enabled-td4662291.html
[20:58] <kompewter> [ DSpace - Tech - Psuedo video streaming with item attachment and seeking enabled ] - http://dspace.2283337.n4.nabble.com/Psuedo-video-streaming-with-item-attachment-and-seeking-enabled-td4662291.html
[20:58] <kshepherd> re: video streaming, we already support http pseudo-streaming
[20:58] <kshepherd> if that's all anyone wants, then it's as simple as making a nice jwplayer item view or something ;)
[20:58] <kshepherd> but i was looking into RTMP/HLS/DASH streaming
[20:58] <tdonohue> mhwood: but, it'd require a mentor who is willing to help guide a student along...so, it might be a matter of whether you had time to do so, or someone else had time to do so
[20:58] <helix84> kshepherd: is http pseudo-streaming Accept-Range? and that is commented out by default?
[20:59] <kshepherd> so people can jump/seek through a 3 hour hi-def video without buffering
[20:59] <mhwood> tdonohue: that leads to helix84's idea to split off the UI work. But...what UI work is needed? Fixing removed getThisAndThat methods in the UIs is something I'll just do.
[20:59] <kshepherd> helix84: i believe so.. i'm still catching up on this "commented out by default" issue
[21:00] <hpottinger> helix84, I believe so, I've run that code uncommented on our repository, but we also embed PDF files
[21:00] <tdonohue> mhwood: some sample UI work would be to actually *allow* users to enter metadata on other objects from the UI (e.g. a new form for Collection/Community metadata, etc.)
[21:00] <kshepherd> and I was mostly interested in real streaming, not pseudo streaming
[21:00] <kshepherd> (of audio/vide)
[21:00] <helix84> hpottinger: so could that code be branched not to be executed for PDFs?
[21:00] <helix84> hpottinger: just to deal with one issue at a time
[21:00] <tdonohue> mhwood: from the sounds of it, most of what you are doing is tweaking the APIs to support M4All...you aren't actually changing/building new UIs to capture that new metadata or allow editing of it.
[21:00] <mhwood> tdonohue: yes, that's it. The need to treat those form fields as first-class metadata with multiple values, language, etc.
[21:02] <tdonohue> mhwood: but, that being said, a student couldn't even *build* those UIs without the necessary APIs in place. So, this is only a viable GSoC project if we have a plan to get those APIs ready before GSoC starts
[21:02] <hpottinger> I'm interested in a solid solution to streaming, and would be most comfortable farming that task out to a dedicated service (where one could possibly scale out the infrastructure supporting it) instead of relying upon a single tomcat instance to do it all
[21:02] <mhwood> tdonohue: good point. I don't think I can have something usable ready that quickly. :-(
[21:02] <mhwood> hpottinger++
[21:03] <tdonohue> mhwood: in that case, this doesn't sound like something to enter in for GSoC, at least not this year. Perhaps next year it would be more doable.
[21:03] <mhwood> tdonohue: I agree. Thanks for working it out with me.
[21:03] <tdonohue> mhwood: no problem, any time
[21:03] <kshepherd> related question (to streaming): does anyone currently use nginx as a reverse proxy / load balancer in front of their DSpace instances?
[21:03] <mhwood> streaming: can we make it easy for external applications to get read-only access to assets?
[21:04] <helix84> hpottinger: no argument about that. I just want to point out that most small repositories won't be distributed - why not just to the stuff in DSpace (non-scalable) finished properly before moving up in complexity. Or am I misunderstanding?
[21:04] <mhwood> service applications, that is.
[21:04] <kshepherd> mhwood: yep, i've already successfully used the REST API bitstream/retrieve method, which uses HTTP
[21:04] <kshepherd> mhwood: though network bandwidth does matter..
[21:04] <helix84> mhwood: I think the question is what layer they will access it on
[21:04] <mhwood> I was thinking more in terms of "it's at this relative path on your NFS mount" or some such.
[21:05] <helix84> if we pipe it via REST then what's the point? we're not removing the bottleneck.
[21:05] <hpottinger> helix84: I understand, just when you talk about streaming, you really need to be prepared to scale
[21:05] <kshepherd> mhwood: yeh
[21:05] <kshepherd> well, another idea i had was for an [authenticated?] REST API request to respond with assetstore location
[21:05] <kshepherd> if you're on the same box
[21:05] <hpottinger> pipe by REST over localhost is almost like everything else we do with reverse proxy
[21:05] <kshepherd> or similar
[21:06] <kshepherd> uh
[21:06] <kshepherd> maybe i didn't explain that clearly
[21:06] <kshepherd> my first RTMP implementation integrated with DSpace by actually using dspace libs, the java API.. reading from assetstore
[21:06] <helix84> hpottinger: my point is if you pipe by REST why not pipe by XMLUI or JSPUI?
[21:06] <helix84> hpottinger: that was a rhetorical question
[21:07] <hpottinger> helix84: pipe to the RTMP server, RTMP server hands out the bits
[21:07] <mhwood> If you respond with a path relative to the assetstore directory, then the streamer(s) can be distributed if you want. They just have to be able to find that directory and access files under it *somehow*.
[21:07] <helix84> I like kshepherd's idea about passing assetstore location via REST
[21:07] <PeterDietz> so, your saying your streaming tool would like to have direct access to the bitstream, for it to do its own cool stuff (i.e. streaming, seeking)? Or would it be feasible to have a way to have the bitstream endpoint support seeking.. i.e. these are the bytes to the object, starting at byte[5000000]
[21:07] <hpottinger> direct file access is OK, too
[21:08] <mhwood> You can mount the same directory read-only across NFS or SMB or SAN fabric or....
[21:08] <kshepherd> mhwood: one issue i'm currently having with the streamers i'm testing (nginx_rtmp, red5, crtmpserver) is that they all think it's ok to write seek and meta files to the same dir! so i have to extend that thinking a bit ;)
[21:08] <kshepherd> direct file access i think is ok, but presumes that you're either running the streamer on teh same machine, or have NFS-style access to the assetstore
[21:08] <mhwood> Maybe a translucent filesystem with the assetstore underneath.
[21:08] <helix84> mhwood: unionfs or what do they call it these days?
[21:10] <kshepherd> it's also possible that we could, say, write a curation task to create meta and seek files from mp4s, etc., so they too could be read as dspace bitstreams instead of [sometimes badly] generated/cached by the streamer itself
[21:11] <helix84> lots of options here, we should start writing them all down on a wiki page
[21:11] <PeterDietz> I've sometimes had the idea that DSpace would house the preservation copy, of say large mp4, then maybe have a sync task that pushes a copy of mp4, so some streaming video server ingest API.. i.e. youtube-ingest, or whatever solution you choose
[21:12] <mhwood> Sounds like the streamers themselves need some rework. I'd be reluctant to let them scribble in the same tree with my carefully-collected media files.
[21:12] <hpottinger> mhwood++
[21:12] <kshepherd> so the reason i tested piping a file through the REST API to my streamer (on a different box) the other day is (a) i was already messing with REST API, but i get that retriveing a bitstream via a JSPUI or XMLUI URI is just as "RESTful", and (b) i just wanted to see how "loose" i could make this integration.. and yep, on a fast network, especially if you're ok with the streamer caching the video asset after it's pulled it down once, you can
[21:12] <kshepherd> mhwood: absolutely
[21:12] <kshepherd> wiki page is definitely a good idea
[21:13] <PeterDietz> I would assume that streaming servers want to do some pre-processing on the content. i.e. make various versions, such as small, medium, large, desktop, mobile, flash, ogg, ..., low-bandwidth
[21:13] <kshepherd> was waiting to see if people actually thought this was a good idea at all before thinking of making one ;)
[21:13] <hpottinger> +1 wiki page
[21:13] <kshepherd> PeterDietz: nope, not in my experience.. if you have nice mp4, it will want to create a seek file (with offsets for seeking) and some metadata, but only if that doesn't already exist
[21:14] <hpottinger> also, working demos in Vagrant-DSpace would be cool
[21:14] <kshepherd> PeterDietz: and not all streamers do the metadata job so well, so it might actually be good to have dspace help out with that ;)
[21:14] <mhwood> kshepherd: sounds promising to me.
[21:14] <hpottinger> I'd volunteer to build the demos
[21:14] <kshepherd> as far as actual video curation.. size/quality/etc, i think that's a curation job for someone at the institution usually, hard to automate that sort of thing without error
[21:15] <kshepherd> but maybe we can figure something out there too?
[21:15] <kshepherd> hpottinger: cool
[21:15] <mhwood> I agree about the curation issue.
[21:15] <kshepherd> i have working demos using Red5 (java), crtmpserver (c++, aka rtmpd), and nginx_rtmp module (c), but haven't put them on a public box yet
[21:16] <kshepherd> if anyone has some nice big high quality mp4s that i can use without getting lawyer letters, please link them on the wiki :)
[21:16] <helix84> if we can agree today that a part of the work definitely needs to be done we can add it to the GSoC ideas list
[21:16] <PeterDietz> ok, so maybe you've already done this, but when I want to watch video1.mpg, which the file lives in DSpace, but I want to start play-back at 50%. Instead of having the client directly hit DSpace/REST/bitstream/1, they instead hit streamer/video/1, which is on same network as DSpace machine, so can download the video file quickly, and then allow for seeking, since it would be seeking again local temp file?
[21:17] <kshepherd> PeterDietz: sorry missed your coment about "preservation in dspace", "access copy in streamer cache/store" -- yep, i've had the same idea
[21:18] <kshepherd> PeterDietz: hmm.. i don't see that it matters
[21:18] <kshepherd> if this is true RTMP streaming (or HLS/DASH for future proofing), then all you get are chunks
[21:18] <kshepherd> the server does the work of finding out which chunk is 50% through the video, and sending them to you
[21:19] <kshepherd> my very old and clunky and not-indicative-of-solution-im-talking-about-today examples can be found at eg http://ampm.auckland.ac.nz/handle/id/87771
[21:19] <kshepherd> that's Red5 RTMP
[21:19] <kompewter> [ Competition performance ] - http://ampm.auckland.ac.nz/handle/id/87771
[21:19] <kshepherd> kompewter: well said
[21:20] <mhwood> You (the user) probably don't want the exact middle frame, but the chapter or whatever that's about halfway through.
[21:21] <kshepherd> oh, hrm.. yeh i think maybe some better 'bookmarking' can be done too? that could be another nice curation activity.. links on or near the player that will skip you to chapter 1,2,3 etc
[21:22] <helix84> that all sounds like something that should be the streaming server's responsibility. aren't we designing DSpace into being one?
[21:22] <helix84> ehm, not all, just the last part
[21:22] <kshepherd> but yeh, the "client" never accesses dspace/rest/bitstream/1, the streamer does. the client accesses rtmp://your.streamer.com/streams/1
[21:22] <kshepherd> helix84: well, the player maybe rather than the streamer
[21:22] <kshepherd> point taken
[21:23] <helix84> I mean it's fine if there's a bitstream with metadata about chapters/bookmarks - but it should be uploaded to DSpace, not generated by DSpace
[21:23] <kshepherd> but if thinks like seek/meta/bookmarks can be made generic to work with any streamer, then it seems useful (as a curation task, not as a core dspace thing)
[21:23] <mhwood> I agree with what I think I saw, that the streaming function should be external to DSpace whether it is local or remote.
[21:23] <helix84> kshepherd: sounds good
[21:23] <hpottinger> nice thing about this "farm out the derivatives to another service" approach is you could do the same with DJatoka
[21:25] <mhwood> Marking up the material with bookmarks sounds like a job for a human, using tools familiar to a video editor. DSpace should accept and store them. It might offer mechanical translation between bookmark file formats, if that's useful.
[21:29] <kshepherd> so when you get down to the basics, what we want shouldn't be too hard: "dspace gives me the file location, preferably in a nice restful way, streamer reads/caches it, and doesn't mess up assetstore with extra seek or metadata files"
[21:29] <mhwood> Yes.
[21:29] <hpottinger> I like that spec
[21:30] <mhwood> "Streamer is not allowed to write the assetstore. How you arrange that is up to you." I don't think it will be difficult.
[21:30] <hpottinger> if we can have the streamer pull the file via REST instead of relying on filesystem access I'd be even happier
[21:30] <mhwood> Uh, why?
[21:31] <helix84> hpottinger: you already have 2 such REST APIs - XMLUI and JSPUI
[21:31] <hpottinger> then the streamer can live wherever I have the hardware
[21:31] <kshepherd> hpottinger: yep i do have that demo'd, but if you want that i recommend you let the streamer cache them otherwise it has to pull them every time before it starts streaming
[21:31] <PeterDietz> Adding the file path isn't to weird to ask for. Adding additional data/field to API is a compatible thing to add
[21:31] <kshepherd> yeah forget that REST vs REST vs REST argument
[21:31] <kshepherd> hpottinger is asking asking for it to be pure HTTP, no direct file access necessary
[21:31] <mhwood> mount -t nfs dspace.example.com:/assetstore /mnt/assetstore
[21:32] * kstamatis (25067aca@gateway/web/freenode/ip. Quit (Ping timeout: 245 seconds)
[21:32] <kshepherd> yeh, i think that assumes a lot about network permissions etc., though.. if my streamer is some central "nz academic streaming service" they will not get NFS access to my stuff ;)
[21:33] <hpottinger> direct file access comes with security concerns
[21:33] <hpottinger> and you pay a heavy price if the streamer misbehaves
[21:33] <kshepherd> but i admit, nfs or similar would probably do the job for *my* own needs
[21:33] <mhwood> Export it readonly.
[21:34] <hpottinger> but, then, we're moving a dev concern into the realm of ops
[21:34] <hpottinger> making my job way, way more difficult :-)
[21:35] <kshepherd> anyway, dspace is already quite good at serving bitstreams over http so we won't worry about that for now ;)
[21:35] <tdonohue> HTTP > direct file access in my opinion too. Direct file access is fine if it's kept secure...but keeping it secure sometimes also means a harder/more complex setup/installation process.
[21:35] <kshepherd> that's a hack that would take place at the streamer end more than dspace
[21:36] <tdonohue> If you want to make streaming "dead simple" you need to make it "dead simple" to configure/setup...which likely means HTTP access to files
[21:36] <helix84> we already do support bitstream access over HTTP, if someone wants to use that, it's there. why are we even talking about it?
[21:36] <mhwood> I'm imagining that the assetstore is on a SAN box, and both DSpace and the streamer are getting assets via network.
[21:36] <kshepherd> helix84: i think we're just talking about it because streamers like the ones mentioned don't tend to have that as a "file retrieval" method yet
[21:36] <helix84> telling file location via a REST API is the new thing
[21:37] <mhwood> I actually have half of that (the DSpace half) in production here.
[21:37] <kshepherd> helix84: yeh, for dspace, yeh
[21:37] <helix84> I'm just saying - half of the discussion here was about adding a method to give HTTP access t the stramer - something that's already solved
[21:37] <kshepherd> no
[21:38] <kshepherd> at least, i hope not :P
[21:38] <kshepherd> i was talking about the streamers' lack of "getting files from http" ability, not dspace's lack of "serving files over http" ability ;)
[21:39] <kshepherd> which i know isn't strict *dspace* dev, but still required towards that particular solution
[21:39] <hpottinger> I was asking that we configure the streamer to pull bitstreams via HTTP (presumably through REST), instead of relying on filesystem access (which spooks me)
[21:39] <kshepherd> but i do like the file/nfs access stuff talked about so far too
[21:39] <mhwood> Exactly. The streamer should just see files. How it gets them is the sysadmin's problem.
[21:39] <helix84> kshepherd: yes, I understood that, sorry.
[21:40] <hpottinger> mhwood is cavalier about the sysadmin handoff because he wears both hats... I do not
[21:40] <tdonohue> hpottinger++ (I am of the same opinion...I don't like having another application mucking around, even if supposedly "read-only" in the assetstore)
[21:41] <helix84> the interesting point about not giving direct FS access is - the point of your streamers is scalability - and yet you insist on keeping the bottlenect
[21:41] <tdonohue> having another application mucking around in the assetstore is extremely dangerous unless you have a good sysadmin who knows how to properly limit the access rights to read-only (Which many of our 1,000+ DSpace institutions may not have)
[21:42] <tdonohue> If streaming needs direct filesystem access, it should be accessing a *cache* of the assetstore, or retrieving files via HTTP. Otherwise, I just worry we are making major assumptions around sysadmin expertise at most Dspace institutions.
[21:42] <hpottinger> My preference would be to keep as much of this solution in the app config and coding realm as possible, otherwise we are putting a lot of trust in ops getting things right
[21:43] <hpottinger> My ops people are *great* but they go through a lot of turnover
[21:43] <mhwood> Sometime you should hear ops talking about whether they can trust the coders to get anything right.
[21:43] <hpottinger> oh, sure, we throw rocks over the wall all the time :-)
[21:44] <mhwood> :-)
[21:44] <hpottinger> stupid wall (grumble grumble)
[21:47] <hpottinger> helix84: you'll only hit that REST link once, derivatives would be cached, you serve the derivatives
[21:47] <mhwood> What will do the caching?
[21:48] <hpottinger> the streamer keeps its derivatives, using the simple example of DJatoka, you just tell it to keep derivatives around and give it some space to keep them
[21:48] <kshepherd> i ahve to go soon, lots going on here right now
[21:49] <mhwood> Thanks for bringing up the streaming issues.
[21:49] <hpottinger> I imagine the RTMP options kshepherd is working with all do something similar
[21:49] <hpottinger> ok, so, kshepherd, are you going to make the wiki page?
[21:49] <kshepherd> ok
[21:50] <kshepherd> later today (or early tomorrow, depending on where you are in th world)
[21:51] <hpottinger> cool, as soon as you have some config information / how to stuff, I'll try setting up some branches of Vagrant-DSpace that'll demo the options
[21:52] <tdonohue> I'll also note another reason against direct filesystem/assetstore access...it's yet another "backdoor" (bypassing our existing APIs) which bypasses DSpace access rights, etc. We keep talking about how we wish we had a common "business logic layer". Having streaming bypass our API would seem to be a step backwards.
[21:53] <tdonohue> But, streaming in general is very high on our "wish list"...so, I'm willing to look at anything. Just noting my concerns with regards to filesystem access
[21:53] <mhwood> REST won't tell you where to find the file if you can't have it.
[21:53] <kshepherd> it does mean we could need some authN for REST API.. but to put it another way, it's a good excuse for authN in REST API :)
[21:53] <mhwood> Ugh, security by obscurity.
[21:54] <hpottinger> I don't think it's exactly security by obscurity... :-)
[21:54] <kshepherd> no, i don't see where that came from
[21:55] <mhwood> I mean, if it's a network mount, then the files are all there. You just don't know (from the far end) which is which, because DSpace won't tell you.
[21:55] <kshepherd> and why won't dspace tell you? if you're using the java api, just disable authZ manager, done
[21:55] <kshepherd> if you're using REST, just start iterating files
[21:55] <mhwood> I'm imagining that we will code it not to tell you, unless you are authorized to know
[21:56] <mhwood> Once REST has authN, it may not let those files through the iterator.
[21:56] <helix84> hpottinger: could you please check whether this SQL is valid for Oracle, too? https://jira.duraspace.org/browse/DS-1536?focusedCommentId=32835&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-32835
[21:56] <kompewter> [ https://jira.duraspace.org/browse/DS-1536 ] - [DS-1536] having a DOT in handle prefix causes identifier.uri to be cut off when being created - DuraSpace JIRA
[21:56] <mhwood> In fact it should not.
[21:56] <kompewter> [ [DS-1536] having a DOT in handle prefix causes identifier.uri to be cut off when being created - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-1536?focusedCommentId=32835&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-32835
[21:56] <kshepherd> mhwood: well yep authN would be real nice for REST yeh
[21:57] <hpottinger> helix84 thinks I have a working Oracle test environment :-)
[21:57] <kshepherd> i just mean, if you can request one file, figure out the base NFS dir, and there's nothing particularly stopping you traversing all the directories... then you'll get all the files you want, even if the dspace admin didn't want you to. which i think is bad.
[21:57] <helix84> hpottinger: hey, you keep going on and on about how great DSpace-Vagrant is. I thought you did :)
[21:57] <tdonohue> kshepherd++
[21:58] <tdonohue> helix84: vagrant-dspace only supports postgres so far
[21:58] <hpottinger> helix84: Vagrant-DSpace *is* great, so is PostgreSQL :-)
[21:58] <mhwood> Hmmm...a filesystem layer that asks DSpace for permission before letting NFS see.... Starting to get complicated.
[21:58] * hpottinger omits derogatory words about *other* databases.
[21:59] <kshepherd> mhwood: well, that's why we're saying filesystem is inherently insecure
[21:59] <mhwood> I have a working Oracle 10g installation, but very little experience.
[22:00] <mhwood> I guess so, if you don't trust both ends of the link.
[22:00] <tdonohue> kshepherd++ again
[22:00] <hpottinger> all silliness aside, that SQL looks pretty ordinary to me, and I'd be very surprised if it didn't work as-is, though Oracle is *full* of surprises
[22:01] <mhwood> I worry about how it will perform, over HTML. Video eats bandwidth like candy.
[22:01] <hpottinger> we need a whiteboard, I think
[22:02] <hpottinger> the streamer handles the public access to files, the REST connection is for the streamer to obtain originals for generating derivatives
[22:03] <hpottinger> just think of it as a caching reverse proxy
[22:03] <hpottinger> like Varnish
[22:03] <mhwood> Well, I don't seem to be saying anything new, and I've got to go. I will be interested to watch this develop.
[22:03] <mhwood> I've made a note to see if Oracle understands that bit of SQL tomorrow.
[22:04] <tdonohue> Honestly, a part of me starts to wonder: Why not use third-party hosted services like YouTube/Amazon CloudFront/whatever as the access point? Any audio/video files can be shipped off *once* to [external-service], and then a viewer is auto-embedded in the DSpace UI.
[22:04] <mhwood> Because then you're stuck with their viewer?
[22:04] <tdonohue> But, I know that the whole [external-service] model also may not be ideal for everyone.
[22:04] <kshepherd> that is valid though yeh
[22:05] <kshepherd> i haven't gotten much traction with that idea with colleagues i talk to.. they do like to do things locally
[22:05] <tdonohue> mhwood: but, if you have a *choice* of which external service to use (i.e. DSpace integrates with several), then you can choose the one that is best for you.
[22:05] <hpottinger> that's one approach, and would be fine for open access content
[22:05] <mhwood> Which one uses VLC? :-)
[22:06] <mhwood> Anyway, I meant the end-user is stuck with whatever service you chose, and their viewer.
[22:07] <helix84> RT, I need some help regarding release notes
[22:07] <kshepherd> is there not a service like that that will just host the rtmp endpoint, and let you embed whatever view / let the user use VLC if they want?
[22:07] <kshepherd> i mean i know youtube won't do that
[22:08] <helix84> I added the Ds-1536 documentation to the 4.1 release notes here: https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.1+Notes
[22:08] <kompewter> [ DSpace Release 4.1 Notes - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.1+Notes
[22:08] <tdonohue> external services could potentially also work for restricted access content... if the video is limited access, it gets access-restricted in YouTube (for example), and the auto-embedded viewer only appears if you are authenticated in DSpace.
[22:08] <helix84> 1) This needs to be moved to a proper location, I don't know where - please do so
[22:08] <helix84> 2) we still need to put the SQL into the 5.0 release notes for those who upgrade 4.0 -> 5.0
[22:09] <tdonohue> the big downside to external services (like YouTube, etc) is that they *will* likely cost money at some point...there's a limit to how much you can store there. Plus, some folks will want or need (based on local policies) to have their content kept locally
[22:09] <kshepherd> yeh
[22:09] <mhwood> I made a note to look at the release notes issue for 1536 too. Must go now.
[22:09] <tdonohue> but, I still wanted to "throw it out there"... I fully realize that using an external service won't solve the problem for everyone...but it's "easier" in some ways
[22:10] * mhwood (mwood@mhw.ulib.iupui.edu) Quit (Remote host closed the connection)
[22:10] <hpottinger> helix84, that SQL does through an error (I just ran it against my live repository)
[22:10] <hpottinger> s/through/throw/
[22:10] <kompewter> hpottinger meant to say: helix84, that SQL does throw an error (I just ran it against my live repository)
[22:11] <hpottinger> I think the join syntax needs to use an explicit join of some sort
[22:11] <hpottinger> Error starting at line 4 in command:
[22:11] <hpottinger> AND metadatafieldregistry.metadata_field_id = metadatavalue.metadata_field_id
[22:11] <hpottinger> Error report:
[22:11] <hpottinger> Unknown Command
[22:11] <helix84> hpottinger: you mean on Oracle?
[22:12] <helix84> hpottinger: yes, I think there is a different syntax for natural joins
[22:12] <hpottinger> helix84: yes, on Oracle
[22:14] <helix84> hpottinger: I have no way of testing it, though. I have no writable Oracle (or a readable one with DSpace tables)
[22:14] <kshepherd> ok i'm off, thanks for the discussion all, i'll throw up a wiki page soon
[22:14] <tdonohue> Essentially, my message around Streaming: If we can find a way to build it such that we can support multiple streaming "options/plugins" (e.g. use YouTube or Red5 or whatever), it might be nice. But, starting *somewhere* is better than nothing :)
[22:14] <tdonohue> thanks kshepherd
[22:16] <helix84> hpottinger: could you please take care of the release notes stuff? I don't know what to do with it and I'm off to bed.
[22:17] <hpottinger> before I agree to "stuff" can you define? :-)
[22:17] <helix84> see above
[22:18] <hpottinger> so "stuff" = "document the SQL for DS-1536"
[22:18] <kompewter> [ https://jira.duraspace.org/browse/DS-1536 ] - [DS-1536] having a DOT in handle prefix causes identifier.uri to be cut off when being created - DuraSpace JIRA
[22:18] <helix84> I've aleady done that. See 1) and 2)
[22:18] <helix84> 23:08
[22:19] <hpottinger> 5.0 release notes?
[22:19] * kshepherd (~kim@wireless-nat-1.auckland.ac.nz) Quit (Quit: leaving)
[22:20] <hpottinger> as in "I don't know where the 5.0 release notes are, please make them for me and then move SQL information for DS-1536 into them?"
[22:20] <kompewter> [ https://jira.duraspace.org/browse/DS-1536 ] - [DS-1536] having a DOT in handle prefix causes identifier.uri to be cut off when being created - DuraSpace JIRA
[22:21] <hpottinger> sorry, was distracted by tinkering with Oracle SQL
[22:22] <helix84> nevermind, I'll just reopen the ticket
[22:23] <hpottinger> for the record, I'm still not sure what I agreed to do, but, I'm always happy to help out. If it appears that I haven't done what I said I'd do, assume I'm befuddled.
[22:24] <helix84> to clarify: the location of release notes changed in 4.0. There is no obvious location for me to put 4.1 relnotes, so I create them at the old location. They just need to be moved. This is 1)
[22:25] <helix84> 2) the same fix needs to be documented for the 4.0 -> 5.x upgrade, ergo in the 5.0 relnotes. What better time to create them than now? Lest we forget.
[22:26] * lo5an (~lo5an@unaffiliated/lo5an) has joined #duraspace
[22:27] <hpottinger> OK, I'll look into it, *and* I'll try to produce a clear explanation for future notes creation purposes
[22:27] <hpottinger> because that would be awesome, and I like doing awesome stuff
[22:28] <tdonohue> +1 to doing awesome stuff
[22:28] <helix84> hpottinger: thanks
[22:28] <helix84> good night all
[22:28] <hpottinger> today kinda/sorta counts as talking about features, I think
[22:30] <tdonohue> yea, it does...our "meeting" lasted 2+ hours, but hey... DSpace streaming!
[22:31] <hpottinger> looking forward to it
[22:52] * hpottinger (~hpottinge@mu-161244.dhcp.missouri.edu) has left #duraspace
[23:05] * tdonohue (~tdonohue@c-50-179-112-246.hsd1.il.comcast.net) has left #duraspace
[23:28] * PeterDietz (~peterdiet@dietz72m1.lib.ohio-state.edu) Quit (Ping timeout: 260 seconds)

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.