#duraspace IRC Log


IRC Log for 2014-08-20

Timestamps are in GMT/BST.

[0:01] * peterdietz (~peterdiet@162-231-22-3.lightspeed.clmboh.sbcglobal.net) has joined #duraspace
[0:31] * peterdietz_ (~peterdiet@server112.longsightgroup.com) has joined #duraspace
[0:33] * peterdietz (~peterdiet@162-231-22-3.lightspeed.clmboh.sbcglobal.net) Quit (Ping timeout: 244 seconds)
[0:33] * peterdietz_ is now known as peterdietz
[2:40] * peterdietz (~peterdiet@server112.longsightgroup.com) Quit (Ping timeout: 244 seconds)
[3:40] * awoods (~awoods@c-67-165-245-76.hsd1.co.comcast.net) Quit (Ping timeout: 272 seconds)
[3:53] * awoods (~awoods@c-67-165-245-76.hsd1.co.comcast.net) has joined #duraspace
[6:48] -asimov.freenode.net- *** Looking up your hostname...
[6:48] -asimov.freenode.net- *** Checking Ident
[6:48] -asimov.freenode.net- *** Found your hostname
[6:48] -asimov.freenode.net- *** No Ident response
[6:48] * DuraLogBot (~PircBot@ec2-107-22-210-74.compute-1.amazonaws.com) has joined #duraspace
[6:48] * Topic is '[Welcome to DuraSpace - This channel is logged - http://irclogs.duraspace.org/]'
[6:48] * Set by cwilper!ad579d86@gateway/web/freenode/ip. on Fri Oct 22 01:19:41 UTC 2010
[9:13] * pbecker (~pbecker@ubwstmapc098.ub.tu-berlin.de) has joined #duraspace
[12:18] * mhwood (mwood@mhw.ulib.iupui.edu) has joined #duraspace
[12:28] * tdonohue (~tdonohue@c-98-215-0-161.hsd1.il.comcast.net) has joined #duraspace
[14:04] * awoods (~awoods@c-67-165-245-76.hsd1.co.comcast.net) Quit (Ping timeout: 260 seconds)
[14:04] * awoods (~awoods@c-67-165-245-76.hsd1.co.comcast.net) has joined #duraspace
[16:24] * pbecker (~pbecker@ubwstmapc098.ub.tu-berlin.de) Quit (Quit: Leaving)
[17:56] * kohts (5b4e6f33@gateway/web/freenode/ip. has joined #duraspace
[18:10] * terryb2 (~anonymous@ has joined #duraspace
[18:18] * hpottinger (~hpottinge@ has joined #duraspace
[19:17] * peterdietz (~peterdiet@server112.longsightgroup.com) has joined #duraspace
[19:54] * KevinVdV (~KevinVdV@d5153D041.access.telenet.be) has joined #duraspace
[19:57] * robint (5eaf588c@gateway/web/freenode/ip. has joined #duraspace
[19:57] * cknowles (~cknowles@cpc19-sgyl35-2-0-cust180.18-2.cable.virginm.net) has joined #duraspace
[20:00] <peterdietz> hi all
[20:01] <robint> hi peterdietz
[20:01] <cknowles> hello
[20:01] <tdonohue> Hi all, it's time for our weekly DSpace Developers meeting! Welcome
[20:01] <tdonohue> Today's general agenda: https://wiki.duraspace.org/display/DSPACE/DevMtg+2014-08-20
[20:01] <kompewter> [ DevMtg 2014-08-20 - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DevMtg+2014-08-20
[20:02] <robint> hi cknowles :)
[20:02] <tdonohue> There's quite a few "gaps" in the agenda for today...so, we have plenty of opportunities to bring up topics to discuss
[20:02] <tdonohue> And, yes, hi cknowles! welcome!
[20:03] <tdonohue> First and foremost though...usual reminders on DSpace 5.0:
[20:03] <cknowles> :)
[20:03] <tdonohue> 1. "Deadline for Feature Pull Requests" is Oct 6!
[20:04] <tdonohue> 2. We still really do need some DSpace 5.0 Release Team members...even folks who can only chip in a bit. We're kinda running on a smaller than normal team, as I've mentioned...I *hope* we'll still be able to keep up with PRs that come flying in, but more help is really needed.
[20:05] * hpottinger thinks there are some other volunteers out there, they just need to stand up :-)
[20:05] <tdonohue> So, please please please. If you are reading this (or the logs later), consider taking part in the 5.0 Release Team!
[20:06] <tdonohue> And if there are volunteers, or people considering it, feel free to get in touch and ask questions. I'd be glad to answer any questions, as I'm sure hpottinger & peterdietz would as well (they are the current 5.0 RT members)
[20:07] <peterdietz> Also, cheers to everyone who has been working on reviewing / merging the existing PR backlog. We're below 80, for the time being
[20:08] <tdonohue> Moving along for now... main topic for today is just talking more about 5.0 (especially any new features folks are working on, or wanting more eyes on / early review of -- as that's the first deadline coming up).
[20:08] <tdonohue> So, if anyone has any 5.0 related tickets/PRs they'd like eyes on, or discussion around, I'll open the floor
[20:09] <peterdietz> If anyone uses Batch Import, i would like some eyes on DSPR#591, as that allows it through XMLUI
[20:09] <kompewter> [ https://github.com/DSpace/DSpace/pull/591 ] - DS-1641 Batch ItemImport from XMLUI by peterdietz
[20:10] <peterdietz> But other than that, I've been watching as Mirage2 got merged in. And we've been having feedback on the ImageMagickPDFThumbnail tool DSPR#603
[20:10] <kompewter> [ https://github.com/DSpace/DSpace/pull/603 ] - DS-2105: Create Thumbnails Using ImageMagick by terrywbrady
[20:10] <tdonohue> peterdietz: at a glance, PR#591 looks cool. Did the better exception handling get added in?
[20:11] <terryb2> DS-2105 I pushed up changes a few minutes ago to make the class names more intuitive.
[20:11] <kompewter> [ https://jira.duraspace.org/browse/DS-2105 ] - [DS-2105] Provide ImageMagick / Ghostscript Filter Media Plugin for Thumbnail Generation - DuraSpace JIRA
[20:11] <mhwood> DS-1577 needs attention from people who understand features that it touches but I don't (yet) use.
[20:11] <kompewter> [ https://jira.duraspace.org/browse/DS-1577 ] - [DS-1577] Replace dependency on commons-httpclient - DuraSpace JIRA
[20:12] <hpottinger> a big one that's going to need more eyes is from @mire: DSPR#612
[20:12] <kompewter> [ https://github.com/DSpace/DSpace/pull/612 ] - DSpace ORCID (proper pull request) by KevinVdV
[20:12] <peterdietz> ohh yeah, its robust, or well, more robust.. robuster? If there is an exception during import, it will remove whatever it imported partially. Your sys-admin might be okay dealing with a map-file, then end user shouldn't deal with a map file: https://github.com/LongsightGroup/DSpace/commit/913218954c261d962dc3eb0a9d160c3b8bddf58a
[20:12] <kompewter> [ BatchItemImport through UI to use addItemsAtomic - all or nothing · 9132189 · LongsightGroup/DSpace · GitHub ] - https://github.com/LongsightGroup/DSpace/commit/913218954c261d962dc3eb0a9d160c3b8bddf58a
[20:12] <tdonohue> terryb2: thanks for the updates to 2105 / PR#603!
[20:12] <mhwood> Yes, I want to find time to get a look at ORCID support.
[20:13] <hpottinger> that would be cool, yeah, one thing we all might consider is clearing some space on the calendar to play with some of the bigger PRs
[20:14] <tdonohue> Stepping back...we've had a ton of things at once here...let's start at PR#591 (Since it made it in first, and we'll move along in order, if that sounds good?)
[20:14] <tdonohue> I wonder if there's anyone with some time to spare to help test out PR#591? Make sure it looks good, etc?
[20:14] <mhwood> Just let me mention DS-2107 to get it on the list....
[20:14] <terryb2> For the batch import, what post import processing should take place?
[20:14] <kompewter> [ https://jira.duraspace.org/browse/DS-2107 ] - [DS-2107] Provide a place for third-party plugins - DuraSpace JIRA
[20:15] <terryb2> (example: filter-media, update index)
[20:16] <tdonohue> terryb2: I think the answer is that #591 is just using "ItemImport" (normally a commandline tool, but now being made available in the UI). By default, it's just going to do the default processing of updating indexing... you'd have to still "schedule" things like media-filtering, etc for overnight
[20:17] <peterdietz> Right, it only imports.. No more / no less than the command line. That's why I've been building media-filter curation tasks: https://www.youtube.com/watch?v=mDufeaNfH0o
[20:17] <kompewter> [ DSpace Curation Task - Generate PDF Thumbnail from XMLUI - YouTube ] - https://www.youtube.com/watch?v=mDufeaNfH0o
[20:18] <terryb2> We have a custom bulk import process that runs outside of DSpace. It has been very helpful to initiate the post import tasks for the users.
[20:18] <robint> I don't think #591 should need much reviewing since its not messing with core code
[20:19] <tdonohue> robint: I think I'm mostly wanting verification to ensure it "works", and that it doesn't break the commandline item-import command (as it does modify it a bit)
[20:19] <tdonohue> robint: i.e. a quick "sanity check" :) I agree though, it looks good
[20:20] <robint> I'm off on holiday next week but might get a wee chance on Friday
[20:20] <hpottinger> I've seen a video of #591 in action
[20:20] <cknowles> if robin runs out of time I will look at picking up next week
[20:20] <tdonohue> don't get me wrong, I trust peterdietz :) I just like to see a sanity check here....cause our sanity checks on Mirage 2 (for example) showed it wasn't building properly
[20:20] <mhwood> I've added the issue to my watch list, and may get a look at it.
[20:21] <tdonohue> thanks robint & cknowles (and mhwood)!
[20:21] <peterdietz> The main touches it does to the existing commandline item-import, is that command line item import had a bug in it, where it choked on ZIP files. So I've fixed that bug to the zip import.. I can close my other PR which fixes zip import
[20:21] <tdonohue> So, it sounds like we have several volunteers for #591 which is awesome
[20:22] <tdonohue> and yes, peterdietz, if this makes your other PR obsolete, then probably best to close the old one
[20:22] <hpottinger> peterdietz: does this other PR have a matching Jira? perhaps link #591 to it, too?
[20:23] <tdonohue> So, for sake of time, I'm moving on. Next from our list above: DS-2105 / DSPR#603
[20:23] <kompewter> [ https://jira.duraspace.org/browse/DS-2105 ] - [DS-2105] Provide ImageMagick / Ghostscript Filter Media Plugin for Thumbnail Generation - DuraSpace JIRA
[20:23] <kompewter> [ https://github.com/DSpace/DSpace/pull/603 ] - DS-2105: Create Thumbnails Using ImageMagick by terrywbrady
[20:23] <tdonohue> which is thanks to terryb2!
[20:24] <terryb2> Please check out the changes and let me know if they look good for use by others.
[20:24] <terryb2> We use the bitstream description field to mark "custom" thumbnails that should not be replaced by the generator.
[20:25] <peterdietz> @terryb2 I've just left a comment on the PR, about simplifying the naming. I.e. I don't think it needs to be prefixed with IMImageMagick, just ImageMagick then whatever it is doing
[20:25] <tdonohue> +1 to peterdietz latest comment. Overall though, I think this is looking good, and I hope we can get it into 5.0 (and kudos to terryb2 for his responsiveness here)
[20:26] <tdonohue> Again, this probably needs a sanity check (post the minor cleanup of names, etc.)
[20:27] <peterdietz> Once this PR gets in, ImageMagickPdfThumb, I'll likely make a PR bringing in richardrodger's media filter curation tasks in. i.e. first generation will have curationtasks that basically just "call" the existing media filters, perhaps later, the media-filter code can be peeled away, and moved to curation. i.e. very similar to the youtube video I pasted a moment ago
[20:28] <tdonohue> Anyone want to volunteer to help terryb2 finish this one up and give it a "sanity check"?
[20:28] <tdonohue> +1 peterdietz for idea of creating a way to run media-filters as curation tasks
[20:29] <peterdietz> It's all Richard, but I was first to wire it up, and make a video: https://github.com/richardrodgers/ctask/tree/master/mediafilter
[20:29] <kompewter> [ ctask/mediafilter at master · richardrodgers/ctask · GitHub ] - https://github.com/richardrodgers/ctask/tree/master/mediafilter
[20:29] <terryb2> +1 to making media filters runnable from the application
[20:30] <tdonohue> peterdietz: any chance you are willing to give terryb2's code a sanity check run (after the little bit of remaining cleanup)?
[20:31] <terryb2> If a filter media task is resource intensive, can it safely execute from the curation process without impacting the tomcat server?
[20:32] <KevinVdV> And can these curation tasks be run in a thread ? Because running filter media might take a long time no ?
[20:32] <mhwood> Tasks can be queued for batch processing later.
[20:32] <tdonohue> terryb2 and KevinVdV: curation tasks can either be run immediately (which would impact Tomcat server) or "queued" (in which case you need to have a scheduled cron-job which "runs the queue" on whatever schedule you want behind the scenes)
[20:32] <peterdietz> good question, I can't recall taxing tomcat such that I ever got a heap/dump OOM / permgen, but I have gotten those on command line.
[20:32] <mhwood> Batch run can be run at lower priority....
[20:33] <mhwood> (cron probably will do that anyway.)
[20:33] <tdonohue> When the tasks are "queued" the do *not* affect Tomcat (though I guess they could slow the entire server down if the use too much memory). When tasks are run immediately, they obviously do affect Tomcat
[20:33] <peterdietz> But the perform now.. I'll have to run some tests, so that hopefully it doesn't introduce a failure point
[20:33] <cknowles> we've had issues with filter media being resource intensive after large data loads
[20:34] <terryb2> We have processed some very large images for image zooming and needed to crank up the resources for filter media
[20:34] <terryb2> glad to hear about the queuing
[20:35] <tdonohue> In other words... Moving Media Filters to Curation Tasks doesn't change things too much... You can still do the processing overnight or whenever you want. But, it gives the additional option of running (smaller tasks) immediately.
[20:35] <mhwood> Queue runs can also run with different JAVA_OPTS.
[20:35] <mhwood> (Different memory limits, for example.)
[20:35] <tdonohue> +1 mhwood: yep, you could give queued runs extra memory as needed
[20:35] <peterdietz> But can you run curate-community with 100,000 PDF's to curate/generate thumbs, which causes tomcat to be unresponsive...
[20:36] <mhwood> nice ionice bin/dspace curate....
[20:36] <peterdietz> Then I'll make a curation-task to restart tomcat...
[20:38] <tdonohue> peterdietz: In the curate.cfg file you can configure which Curation Tasks can be run from the Admin UI itself. If there are some tasks which are going to kill Tomcat every time, you may not make them available from the Admin UI (and instead just schedule them similar to media-filters).
[20:39] <tdonohue> Or, we improve the Curation Tasks Admin UI to provide better warnings, or perhaps even add the ability to disable "run immediately" for certain tasks
[20:40] <hpottinger> ooh.. a curation task to clear the XMLUI Cocoon cache :-)
[20:40] <mhwood> If needed, we can add thread priority to the task infrastructure.
[20:41] <mhwood> Anyway, this sounds good and I'm looking forward to using it.
[20:41] <tdonohue> So, back to DSPR#603 -- anyone want to help terryb2 get this ready to go? peterdietz are you already doing that, or do we need another volunteer?
[20:41] <kompewter> [ https://github.com/DSpace/DSpace/pull/603 ] - DS-2105: Create Thumbnails Using ImageMagick by terrywbrady
[20:43] <peterdietz> tdonohue: I've replied to #603 recently. Yeah, I'll keep following that ticket. I'll assume that nobody has major objections to it, and when its all cleaned/satisfactory we can merge it in
[20:43] <tdonohue> peterdietz: sounds great. yea, I'd give my +1 once the cleanup is done
[20:43] <peterdietz> Sorry terryb2 that contributing features isn't as simple as pasting a patch, and dissappearing.
[20:44] <mhwood> Any chance of using the existing thumbnail dimension properties, as was commented in the PR?
[20:44] <tdonohue> +1 to using existing configs in #603, if possible. I'd rather not have even more configs
[20:44] <tdonohue> For the sake of time here though, I'm going to move to the next ticket in our list (scrolling back): DS-1577
[20:44] <kompewter> [ https://jira.duraspace.org/browse/DS-1577 ] - [DS-1577] Replace dependency on commons-httpclient - DuraSpace JIRA
[20:46] <mhwood> I've tested 1577 lightly, but it touches features that I don't use and thus don't feel confident to test.
[20:46] <tdonohue> mhwood: RE: DSPR#475, it sounds like you need help and it may have broken SWORD client? what are the next steps?
[20:46] <kompewter> [ https://github.com/DSpace/DSpace/pull/475 ] - [DS-1577] &#39;commons-httpclient&#39; is End-of-Life by mwoodiupui
[20:47] <mhwood> If someone who uses any of those features could try out the changes, I would really appreciate it.
[20:48] <tdonohue> I'm unclear if you (a) fixed all features, but it needs thorough testing, or (b) fixed some features, but need help on others?
[20:48] <robint> tdonohue mhwood I'll look at the Sword Client, maybe it could/should be decommisionioned, but it won't be until I get back from holiday
[20:48] <mhwood> Thank you robint.
[20:48] <tdonohue> thanks robint!
[20:48] <mhwood> tdohohue, I updated all of those features but am unsure whether I got the more hairy fixes right.
[20:49] <mhwood> So, (a).
[20:49] <hpottinger> I'm wondering if we need to check interaction of DSPR#612 and DSPR#475
[20:50] <tdonohue> mhwood: OK, that helps. So, this needs thorough testing.
[20:50] <kompewter> [ https://github.com/DSpace/DSpace/pull/612 ] - DSpace ORCID (proper pull request) by KevinVdV
[20:50] <kompewter> [ https://github.com/DSpace/DSpace/pull/475 ] - [DS-1577] &#39;commons-httpclient&#39; is End-of-Life by mwoodiupui
[20:50] <peterdietz> mhwood: from that sonatype blog, do we need to evaluate our usage of BouncyCastle?
[20:50] <mhwood> I don't clearly recall, but I think I decided that nothing else was scary. It wouldn't hurt to double-check the post, though.
[20:51] <tdonohue> peterdietz: might be worth a check, just in case. Likely needs a fresh ticket though, if we feel the need to replace it
[20:51] <mhwood> Is the ORCID code using httpclient?
[20:51] <mhwood> Or httpcomponents?
[20:51] <KevinVdV> I believe so
[20:52] <KevinVdV> Because we need to perform a query on the ORCID webservice
[20:52] <hpottinger> (I just saw "authority" on mhwood's list of things)
[20:52] <tdonohue> mhwood - sounds like you have some possible testers in the wings (with ORCID) :)
[20:53] <KevinVdV> Look for the “HttpClientFactory” class & its usages
[20:54] <tdonohue> So, it sounds like we need to figure out which comes first.... ORCID or upgrading HTTPClient.
[20:54] <tdonohue> (Both sound like they need to get in, but one is going to need to rework the other)
[20:56] <mhwood> I think the ORCID code may be using HttpComponents. Which is good.
[20:56] <tdonohue> Any chance @mire could help with HTTPClient testing (especially of things like Discovery & Solr Stats & also now ORCID, as needed)?
[20:57] * hpottinger wonders if there's a way to catch older dependencies on incoming PRs?
[20:57] <KevinVdV> Won’t be easy… we are still hard @ work on the metadata for all pull request
[20:57] <tdonohue> err...I mean, HTTPComponents testing (testing of PR#475)
[20:58] <hpottinger> I'm going to be testint ORCID code, I'll put HTTPcomponents/et. al. check on my list of things to look at.
[20:58] <mhwood> Thank you.
[20:58] <tdonohue> hpottinger: that'd be great, thanks. I think we *have* to get PR#475 in...and we just need testers here.
[21:00] <tdonohue> So, we are nearly out of time here. We also had ORCID on our list above though: DSPR#612 Anything that we can discuss quickly here? or should we schedule it for next week?
[21:00] <kompewter> [ https://github.com/DSpace/DSpace/pull/612 ] - DSpace ORCID (proper pull request) by KevinVdV
[21:01] <hpottinger> reschedule for next week, it would be cool if we can get aschweer and scherler here, too
[21:01] <tdonohue> In general, where possible we may want to try and "schedule" certain tickets/PRs for discussion in these meetings (especially the biggest ones). I'm trying to keep up, but there's a LOT coming in right now, and it's hard to track it all. So, please feel free to send me things for next week's agenda
[21:01] <robint> Got to head off, cheers all
[21:02] <mhwood> Thanks for discussing 1577 today.
[21:02] * robint (5eaf588c@gateway/web/freenode/ip. Quit (Quit: Page closed)
[21:02] <tdonohue> hpottinger: sounds reasonable. I'll make a note to add it to next week's agenda. But, beware that next week is at 15UTC, which means middle of the night for aschweer
[21:03] <hpottinger> ah, yes... hmm... let's be kind and put it on the schedule for the next "late" meeting
[21:03] <tdonohue> That's it for today then. Have a good week all. I'll be hanging around a bit longer as needed, but no more official topics to discuss.
[21:03] <tdonohue> hpottinger: OK, I'll try to remember that, but please do feel free to remind me (or directly modify my agenda in two weeks) :)
[21:03] <KevinVdV> Allright, I need to run, until next time !
[21:03] * KevinVdV (~KevinVdV@d5153D041.access.telenet.be) Quit (Quit: KevinVdV)
[21:04] <cknowles> bye
[21:04] <kompewter> see ya!
[21:04] * cknowles (~cknowles@cpc19-sgyl35-2-0-cust180.18-2.cable.virginm.net) has left #duraspace
[21:04] * hpottinger going back into distracted mode, will leave IRC on though and lurk, ping me if you need me
[21:04] <mhwood> As next "late" meeting is 14 days hence, we should dig into ORCID individually soon, and tidy up things that don't need wide discussion (which may be all).
[21:04] <tdonohue> +1 mhwood
[21:05] <hpottinger> +1, yes, please, test whatever interests you, comments can go right on the PRs
[21:05] <tdonohue> And, mhwood...I'll try and remember to put your DS-2107 on agenda for next week, if discussion doesn't start earlier.. I didn't forget about it
[21:05] <kompewter> [ https://jira.duraspace.org/browse/DS-2107 ] - [DS-2107] Provide a place for third-party plugins - DuraSpace JIRA
[21:06] <mhwood> Thanks!
[21:11] <peterdietz> Sorry for be post-meeting, but any objections to me just merging Andrea's bug fix: https://github.com/DSpace/DSpace/pull/586
[21:11] <kompewter> [ Ds 2077 sort collection dropdown by aschweer · Pull Request #586 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/586
[21:12] <mhwood> No objection here.
[21:15] <mhwood> I just saw the sneakiest captcha to date: it modifies a picture of a mailbox or the side of a building with a random "house number".
[21:16] <hpottinger> +1 merge 586
[21:17] <tdonohue> seems fine peterdietz
[21:19] <terryb2> Could I ask the group a question about Lucene under DSpace 4?
[21:20] <terryb2> I read on some of the help pages that it is no longer required to run the update index process. I noticed that filter-media still runs that process by default.
[21:21] <terryb2> If I were to always run filter-media -n, would my system be impacted?
[21:24] * mhwood (mwood@mhw.ulib.iupui.edu) has left #duraspace
[21:24] <peterdietz> // update search index?
[21:24] <peterdietz> if (updateIndex)
[21:24] <peterdietz> {
[21:24] <peterdietz> if (!isQuiet)
[21:24] <peterdietz> {
[21:24] <peterdietz> System.out.println("Updating search index:");
[21:24] <peterdietz> }
[21:24] <peterdietz> DSIndexer.setBatchProcessingMode(true);
[21:24] <peterdietz> try
[21:24] <peterdietz> {
[21:24] <peterdietz> DSIndexer.updateIndex(c);
[21:24] <peterdietz> }
[21:24] <peterdietz> finally
[21:24] <peterdietz> {
[21:25] <peterdietz> DSIndexer.setBatchProcessingMode(false);
[21:25] <peterdietz> }
[21:25] <peterdietz> }
[21:25] * kohts (5b4e6f33@gateway/web/freenode/ip. Quit ()
[21:25] <peterdietz> DSIndexer is only the lucene version. Yeah, if you use discovery, your safe to run in -n mode. We should probably remove that requirement...
[21:26] <tdonohue> hmmm..yea, that looks like a bug in DSpace 4
[21:26] <terryb2> Thanks. I will submit a bug ticket to make the default filter-media behavior consistent
[21:26] <tdonohue> thanks, terryb2!
[21:26] <peterdietz> Does the discovery index stay up-to-date by listening to "Events"? Or do you need a cronjob to update discovery? Also, does discovery have extracted-text in its index?
[21:27] <terryb2> It would then require a new flag to invoke that process explicitly
[21:27] <terryb2> @peterdietz, this is part of my batch ingest process
[21:27] <tdonohue> Discovery listens to 'events'. So, it should auto-update itself. But there is a cron job to "optimize" the Discovery index which is recommended to schedule every once in a while
[21:28] <tdonohue> Discovery does index extracted text too.. we probably should verify that it is somehow updating when new text is extracted (by a media filter)...I *think* it is, but I don't know the details of that
[21:28] <terryb2> We have a bulk ingest process that triggers a filter media and index update (after indexing text files). Here is the script. https://github.com/Georgetown-University-Libraries/batch-tools/blob/master/bin-src/dspaceBatch.sh
[21:28] <kompewter> [ batch-tools/dspaceBatch.sh at master · Georgetown-University-Libraries/batch-tools · GitHub ] - https://github.com/Georgetown-University-Libraries/batch-tools/blob/master/bin-src/dspaceBatch.sh
[21:33] <hpottinger> +1 digging into exactly how our indexing works
[21:33] <terryb2> https://jira.duraspace.org/browse/DS-2111
[21:33] <kompewter> [ [DS-2111] Do not update Lucene by default in filter-media - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-2111
[21:33] <kompewter> [ https://jira.duraspace.org/browse/DS-2111 ] - [DS-2111] Do not update Lucene by default in filter-media - DuraSpace JIRA
[21:34] <tdonohue> It looks like Discovery does index automatically... it listens to basically *any* change on any object here: https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L708
[21:34] <kompewter> [ DSpace/dspace.cfg at master · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L708
[21:35] <tdonohue> That config auto-calls the Discovery IndexEventConsumer : https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/IndexEventConsumer.java
[21:35] <kompewter> [ DSpace/IndexEventConsumer.java at master · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/IndexEventConsumer.java
[21:36] <tdonohue> The IndexEventConsumer loads up the indexer (via Spring), which ends up being SolrServiceImpl.....and that class "writes" the Document to Solr (including any full-text Bitstreams) hereish: https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L687
[21:36] <kompewter> [ DSpace/SolrServiceImpl.java at master · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/discovery/SolrServiceImpl.java#L687
[21:37] <tdonohue> Long story short....as soon as "item.update()" gets called by the MediaFilter, Discovery should auto-index the updates into Solr
[21:38] <terryb2> Does that happen on a batch ItemImport? If so, I can simplify my post bulk ingest processing.
[21:40] <tdonohue> Discovery should re-index whenever an Item is Created/Modified/Deleted/Removed from a Collection, or when a Community/Collection is Created/Modified/Deleted. That's what this configuration setting is about: https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L708
[21:40] <kompewter> [ DSpace/dspace.cfg at master · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L708
[21:41] <tdonohue> Essentially that configuration says (in human terms)...Whenever a Community/Collection/Item/Bundle is EITHER Added/Created/Modified/Metadata-Modified/Deleted/Removed, then call the indexer for that object
[21:41] <tdonohue> BTW: On a different topic, a new StackOverflow question tagged "dspace": http://stackoverflow.com/questions/25413032/how-to-reorder-dri-divs-in-dspace-xmlui
[21:41] <kompewter> [ How to reorder DRI divs in DSpace XMLUI - Stack Overflow ] - http://stackoverflow.com/questions/25413032/how-to-reorder-dri-divs-in-dspace-xmlui
[21:46] * peterdietz (~peterdiet@server112.longsightgroup.com) Quit (Quit: peterdietz)
[21:57] * tdonohue (~tdonohue@c-98-215-0-161.hsd1.il.comcast.net) has left #duraspace
[22:03] * terryb2 (~anonymous@ Quit (Ping timeout: 245 seconds)
[22:03] * peterdietz (~peterdiet@server112.longsightgroup.com) has joined #duraspace
[22:17] * hpottinger (~hpottinge@ Quit (Quit: Later, taterz!)
[23:10] * peterdietz (~peterdiet@server112.longsightgroup.com) Quit (Quit: peterdietz)
[23:22] * tdonohue (~tdonohue@c-98-215-0-161.hsd1.il.comcast.net) has joined #duraspace
[23:22] * tdonohue (~tdonohue@c-98-215-0-161.hsd1.il.comcast.net) has left #duraspace

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.